一,实践:Section 3.6 Lab: Linear Regression
# 先加载两个基本数据集
library(MASS)
library(ISLR2)
Attaching package: ‘ISLR2’
The following object is masked from ‘package:MASS’:
Boston
# ISLR2包含Boston数据集,记录了波士顿506个人口普查区的房价中位数(medv),我们将使用12个预测指标来预测medv,如rm(每栋房子的平均房间数)、age(1940年之前建造的自住单元的比例)和lstat(社会经济地位较低的家庭的百分比)
head(Boston)
| crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | lstat | medv | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 4.98 | 24.0 |
| 2 | 0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 9.14 | 21.6 |
| 3 | 0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 4.03 | 34.7 |
| 4 | 0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 2.94 | 33.4 |
| 5 | 0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 5.33 | 36.2 |
| 6 | 0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 5.21 | 28.7 |
我们将首先使用lm()函数来拟合一个简单的线性回归lm()模型, 其中medv作为响应变量,lstat作为预测器。基本语法是lm(y ~ x, data),其中y是响应,x是预测,data是保存这两个变量的数据集。
lm.fit <- lm(medv ~ lstat, data = Boston)
# 上面是正常写法,下面是使用attach函数的写法
# attach(Boston) 将Boston数据集中的变量放入到搜索路径中
# lm.fit <- lm(medv ~ lstat)
打印lm.fit的话会输出一些关于模型的基本信息。
要获得更详细的信息,我们使用summary(lm.fit),这为我们提供了系数的P值和标准误差SE,以及模型的R2统计量和F统计量
lm.fit
Call:
lm(formula = medv ~ lstat, data = Boston)
Coefficients:
(Intercept) lstat
34.55 -0.95
summary(lm.fit)
Call:
lm(formula = medv ~ lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.168 -3.990 -1.318 2.034 24.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.55384 0.56263 61.41 <2e-16 ***
lstat -0.95005 0.03873 -24.53 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared: 0.5441, Adjusted R-squared: 0.5432
F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16
# 访问lm对象的各个部分
names(x = lm.fit) #拟合模型的各个部分
coef(lm.fit) #所估计的系数
confint(lm.fit) #所估计的系数的置信区间
- 'coefficients'
- 'residuals'
- 'effects'
- 'rank'
- 'fitted.values'
- 'assign'
- 'qr'
- 'df.residual'
- 'xlevels'
- 'call'
- 'terms'
- 'model'
- (Intercept)
- 34.5538408793831
- lstat
- -0.950049353757991
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | 33.448457 | 35.6592247 |
| lstat | -1.026148 | -0.8739505 |
对于给定的lstat指,可以使用predict函数生成medv预测值的置信区间和预测区间
predict(lm.fit, data.frame(lstat = (c(5, 10, 15))),interval = "confidence") #预测值和置信区间
predict(lm.fit, data.frame(lstat = (c(5, 10, 15))),interval = "prediction") #预测值和预测区间
| fit | lwr | upr | |
|---|---|---|---|
| 1 | 29.80359 | 29.00741 | 30.59978 |
| 2 | 25.05335 | 24.47413 | 25.63256 |
| 3 | 20.30310 | 19.73159 | 20.87461 |
| fit | lwr | upr | |
|---|---|---|---|
| 1 | 29.80359 | 17.565675 | 42.04151 |
| 2 | 25.05335 | 12.827626 | 37.27907 |
| 3 | 20.30310 | 8.077742 | 32.52846 |
# 使用plot()和abline()函数绘制medv和lstat以及最小二乘回归线
plot(Boston$lstat, Boston$medv)
# 或者attach(Boston)之后直接使用plot(lstat, medv)
abline(lm.fit)
# lstat与medv之间存在一定的非线性关系
abline()函数可以用来绘制任何直线,而不仅仅是最小二乘回归直线。
为了绘制截距为a、斜率为b的直线,我们输入abline(a, b)。
下面我们对绘制直线和点进行一些额外的设置。lwd = 3命令使回归线的宽度增加3倍;这也适用于plot()和lines()函数。我们也可以使用pch选项来创建不同的绘图符号
plot(Boston$lstat, Boston$medv)
abline(lm.fit, lwd = 3)
plot(Boston$lstat, Boston$medv)
abline(lm.fit, lwd = 3, col = "red") #
attach(Boston)
plot(lstat, medv, col = "red")
plot(lstat, medv, pch = 20)
plot(lstat, medv, pch = "+")
plot(1:20, 1:20, pch = 1:20) #pch参数可以设置点的形状
接下来,我们研究一些诊断图,其中一些在第3.3.3节中讨论。
通过直接将plot()函数应用于lm()的输出,可以自动生成四个诊断图。
通常,该命令一次生成一个图形,点击Enter将生成下一个图形。然而,将所有四个图一起查看通常很方便。我们可以通过使用将显示屏幕分割成单独的面板来实现这一点,这样可以同时查看多个plot。例如,par()和par(mfrow = c(2,2)) mfrow()函数,它们将绘制区域划分为2 × 2网格的面板。
par(mfrow = c(2, 2))
plot(lm.fit)
我们可以使用residuals()函数从线性回归拟合中计算残差
函数rstudent()将返回学生化的残差,我们可以使用这个函数来绘制残差与拟合值的关系
predict(lm.fit)
- 1
- 29.8225950976683
- 2
- 25.8703897860351
- 3
- 30.7251419837384
- 4
- 31.7606957793346
- 5
- 29.490077823853
- 6
- 29.604083746304
- 7
- 22.7447274121713
- 8
- 16.3603957549176
- 9
- 6.11886372140644
- 10
- 18.3079969301215
- 11
- 15.1253315950322
- 12
- 21.9466859550146
- 13
- 19.6285655318451
- 14
- 26.7064332173421
- 15
- 24.8063345098261
- 16
- 26.5069228530529
- 17
- 28.3025161316555
- 18
- 20.6166168597534
- 19
- 23.4477639339522
- 20
- 23.837284168993
- 21
- 14.5838034633901
- 22
- 21.4146583169101
- 23
- 16.7689169770335
- 24
- 15.6668597266742
- 25
- 19.0680364131278
- 26
- 18.8685260488387
- 27
- 20.4836099502273
- 28
- 18.136988046445
- 29
- 22.3932091512808
- 30
- 23.1722496213624
- 31
- 13.0827254844525
- 32
- 22.1651973063789
- 33
- 8.22797328674918
- 34
- 17.120435237924
- 35
- 15.2298370239456
- 36
- 25.3573631350058
- 37
- 23.7137777530044
- 38
- 26.2219080469255
- 39
- 24.9298409258147
- 40
- 30.4496276711486
- 41
- 32.6727431589423
- 42
- 29.9556020071944
- 43
- 29.0340541340492
- 44
- 27.4854736874236
- 45
- 25.4808695509943
- 46
- 24.853836977514
- 47
- 21.1106425237075
- 48
- 16.6929130287329
- 49
- 5.28282029009941
- 50
- 19.1630413485037
- 51
- 21.7756770713381
- 52
- 25.5948754734452
- 53
- 29.5375802915409
- 54
- 26.5449248272032
- 55
- 20.4931104437648
- 56
- 29.9841034878072
- 57
- 29.0720561081995
- 58
- 30.801145932039
- 59
- 28.0365023126033
- 60
- 25.7943858377344
- 61
- 22.0606918774655
- 62
- 20.8351282111177
- 63
- 28.1600087285918
- 64
- 25.5283720186822
- 65
- 26.9059435816313
- 66
- 30.1171103973333
- 67
- 24.8253354969013
- 68
- 26.8584411139434
- 69
- 22.117694838691
- 70
- 26.2029070598504
- 71
- 28.1695092221294
- 72
- 25.1673532642542
- 73
- 29.309568446639
- 74
- 27.3904687520479
- 75
- 28.1125062609039
- 76
- 26.0603996567867
- 77
- 23.1817501149
- 78
- 24.7968340162885
- 79
- 22.8302318540095
- 80
- 25.9083917601854
- 81
- 29.5280797980033
- 82
- 27.6944845452504
- 83
- 28.1695092221294
- 84
- 27.4189702326606
- 85
- 25.4143660962312
- 86
- 28.3500185993434
- 87
- 22.3362061900553
- 88
- 26.5354243336657
- 89
- 29.3285694337142
- 90
- 29.1385595629626
- 91
- 26.1839060727752
- 92
- 26.7634361785676
- 93
- 26.8014381527179
- 94
- 28.654034392546
- 95
- 24.492818223086
- 96
- 28.2360126768925
- 97
- 23.7802812077675
- 98
- 30.554133100062
- 99
- 31.1621646864671
- 100
- 28.6730353796211
- 101
- 25.6043759669828
- 102
- 27.2669623360593
- 103
- 24.4548162489357
- 104
- 21.7851775648757
- 105
- 22.8397323475471
- 106
- 18.906528022989
- 107
- 16.825919938259
- 108
- 21.167645484933
- 109
- 22.8967353087726
- 110
- 19.7805734284463
- 111
- 22.2031992805292
- 112
- 24.9013394452019
- 113
- 19.1535408549661
- 114
- 18.317497423659
- 115
- 24.6258251326121
- 116
- 19.5810630641572
- 117
- 23.1152466601369
- 118
- 24.7683325356758
- 119
- 19.9515823121228
- 120
- 21.6236691747368
- 121
- 20.9016316658808
- 122
- 20.9966366012566
- 123
- 17.5194559665023
- 124
- 10.4130868003926
- 125
- 17.8519732403176
- 126
- 20.4836099502273
- 127
- 8.65549549594027
- 128
- 18.2224924882832
- 129
- 19.9325813250476
- 130
- 17.1299357314616
- 131
- 22.5832190220324
- 132
- 22.9062358023101
- 133
- 23.9892920655942
- 134
- 20.2745990924005
- 135
- 18.1084865658323
- 136
- 18.4410038396476
- 137
- 18.4980068008731
- 138
- 20.692620808054
- 139
- 14.2987886572627
- 140
- 17.0159298090106
- 141
- 11.60064849259
- 142
- 1.86264261657064
- 143
- 9.07351721159379
- 144
- 9.45353695309698
- 145
- 6.72689530781155
- 146
- 8.14246884491096
- 147
- 18.7355191393126
- 148
- 6.49888346290963
- 149
- 7.6484431809568
- 150
- 14.1752822412742
- 151
- 21.1581449913954
- 152
- 21.937185461477
- 153
- 23.0392427118363
- 154
- 19.5525615835444
- 155
- 20.1890946505623
- 156
- 20.2840995859381
- 157
- 19.2200443097291
- 158
- 30.1931143456339
- 159
- 28.4450235347192
- 160
- 27.5329761551116
- 161
- 29.3285694337142
- 162
- 32.9102554973818
- 163
- 32.7297461201678
- 164
- 31.3996770249066
- 165
- 23.4952664016401
- 166
- 25.2338567190172
- 167
- 31.0386582704785
- 168
- 23.0202417247611
- 169
- 24.0082930526694
- 170
- 23.7992821948426
- 171
- 20.8446287046553
- 172
- 23.1247471536745
- 173
- 20.5976158726782
- 174
- 25.9653947214109
- 175
- 25.3953651091561
- 176
- 29.490077823853
- 177
- 24.9488419128898
- 178
- 28.5780304442453
- 179
- 27.9794993513778
- 180
- 29.7655921364428
- 181
- 27.3714677649727
- 182
- 25.5758744863701
- 183
- 29.9746029942696
- 184
- 29.1575605500377
- 185
- 21.2721509138464
- 186
- 22.0606918774655
- 187
- 30.32612125516
- 188
- 28.2075111962797
- 189
- 30.2216158262467
- 190
- 29.4330748626275
- 191
- 29.7085891752173
- 192
- 30.0981094102581
- 193
- 31.8271992340977
- 194
- 29.7750926299804
- 195
- 30.3926247099231
- 196
- 31.7321942987219
- 197
- 30.6776395160505
- 198
- 26.3739159435268
- 199
- 28.2645141575052
- 200
- 30.2216158262467
- 201
- ⋯
- 202
- 28.4070215605689
- 203
- 27.3999692455854
- 204
- 30.2406168133218
- 205
- 25.0818488224159
- 206
- 22.5452170478821
- 207
- 28.8725457439103
- 208
- 23.4192624533394
- 209
- 27.048450984695
- 210
- 25.7373828765089
- 211
- 23.6282733111662
- 212
- 17.1394362249991
- 213
- 19.4100541804807
- 214
- 24.7113295744503
- 215
- 22.4597126060439
- 216
- 27.7134855323256
- 217
- 28.0270018190657
- 218
- 27.2384608554466
- 219
- 23.4002614662643
- 220
- 28.7395388343842
- 221
- 29.7275901622925
- 222
- 28.7110373537715
- 223
- 22.4027096448184
- 224
- 25.0818488224159
- 225
- 27.5804786227995
- 226
- 25.917892253723
- 227
- 22.7447274121713
- 228
- 27.114954439458
- 229
- 29.1575605500377
- 230
- 28.1410077415167
- 231
- 26.9439455557816
- 232
- 25.2433572125548
- 233
- 24.5213197036987
- 234
- 26.4689208789026
- 235
- 25.3003601737803
- 236
- 25.7278823829714
- 237
- 29.3380699272517
- 238
- 26.3359139693765
- 239
- 27.7324865194007
- 240
- 30.1741133585588
- 241
- 24.5498211843115
- 242
- 22.5167155672694
- 243
- 28.5115269894823
- 244
- 28.8630452503727
- 245
- 28.9580501857485
- 246
- 28.8725457439103
- 247
- 29.3380699272517
- 248
- 27.1529564136084
- 249
- 30.2786187874721
- 250
- 26.9059435816313
- 251
- 29.2620659789511
- 252
- 17.8329722532425
- 253
- 21.9466859550146
- 254
- 23.6472742982414
- 255
- 22.5167155672694
- 256
- 27.1529564136084
- 257
- 21.0726405495572
- 258
- 24.8728379645892
- 259
- 20.6451183403661
- 260
- 29.5280797980033
- 261
- 27.7894894806262
- 262
- 21.2531499267712
- 263
- 21.8896829937891
- 264
- 31.4566799861321
- 265
- 31.0101567898658
- 266
- 31.7416947922594
- 267
- 25.4998705380695
- 268
- 26.1174026180121
- 269
- 1.52062484921776
- 270
- -1.51953308280781
- 271
- 21.7851775648757
- 272
- 12.4746938980474
- 273
- 14.3747926055634
- 274
- 12.0471716888563
- 275
- 13.8617659545341
- 276
- 18.2034915012081
- 277
- 14.5268005021647
- 278
- 12.1326761306945
- 279
- 11.2206287510868
- 280
- 5.45382917377584
- 281
- 5.28282029009941
- 282
- 7.68644515510712
- 283
- 4.16176205266498
- 284
- 5.46332966731342
- 285
- 14.745311853529
- 286
- 18.2984964365839
- 287
- 16.7309150028832
- 288
- 10.1565734748779
- 289
- 20.1415921828744
- 290
- 19.02053394544
- 291
- 18.2889959430463
- 292
- 16.1513848970908
- 293
- 15.6288577525239
- 294
- 5.49183114792616
- 295
- 6.08086174725612
- 296
- 9.12101967928169
- 297
- 15.2488380110207
- 298
- 15.2583385045583
- 299
- 15.7713651555876
- 300
- 8.54148957348932
- 301
- 12.7217067300245
- 302
- 12.3796889626716
- 303
- 23.0297422182987
- 304
- 9.47253794017215
- 305
- 15.76186466205
- 306
- 24.9488419128898
- 307
- 14.3937935926385
- 308
- 1.90064459072096
- 309
- 15.4768498559226
- 310
- -0.578984222587394
- 311
- 6.95490715271347
- 312
- 10.0520680459645
- 313
- 9.24452609527023
- 314
- 14.9638232048933
- 315
- 12.9497185749264
- 316
- 20.2840995859381
- 317
- 19.6380660253826
- 318
- 21.1581449913954
- 319
- 12.4271914303595
- 320
- 18.250993968896
- 321
- 11.3821371412257
- 322
- 19.6475665189202
- 323
- 20.7591242628171
- 324
- 14.1087787865111
- 325
- 11.6766524408907
- 326
- 17.7949702790921
- 327
- 15.8473691038883
- 328
- 23.1247471536745
- 329
- 19.1440403614285
- 330
- 20.1415921828744
- 331
- 12.4461924174347
- 332
- 17.4054500440514
- 333
- 9.42503547248425
- 334
- 2.23316186453625
- 335
- 12.8167116654003
- 336
- 13.5482496677939
- 337
- 16.0088774940271
- 338
- 18.792522100538
- 339
- 16.645410561045
- 340
- 11.9521667534805
- 341
- 11.7716573762665
- 342
- 17.6524628760284
- 343
- 18.9350295036017
- 344
- 17.3294460957507
- 345
- 16.2083878583163
- 346
- 17.9849801498437
- 347
- 17.7094658372539
- 348
- 18.1464885399826
- 349
- 18.6500146974743
- 350
- 16.7784174705711
- 351
- 17.3294460957507
- 352
- 16.4934026644437
- 353
- 18.4600048267227
- 354
- 19.1345398678909
- 355
- 20.5881153791406
- 356
- 18.9540304906769
- 357
- 20.6356178468285
- 358
- 21.2626504203088
- 359
- 24.7778330292134
- 360
- 21.9941884227025
- 361
- 21.1296435107827
- 362
- 18.2604944624336
- 363
- 14.2987886572627
- 364
- 17.3294460957507
- 365
- 20.5311124179152
- 366
- 19.0775369066654
- 367
- 22.3267056965178
- 368
- 20.9111321594184
- 369
- 23.4762654145649
- 370
- 17.3199456022131
- 371
- 11.6576514538155
- 372
- 16.8069189511838
- 373
- 10.8881114772716
- 374
- 17.4244510311265
- 375
- 22.0986938516158
- 376
- 24.3503108200223
- 377
- 27.2004588812963
- 378
- 27.8939949095396
- 379
- 24.6543266132248
- 380
- 21.8801825002515
- 381
- 24.5023187166236
- 382
- 20.3221015600884
- 383
- 23.6757757788541
- 384
- 17.3959495505138
- 385
- 11.7811578698041
- 386
- 6.35637605984594
- 387
- 17.3864490569762
- 388
- 21.8706820067139
- 389
- 23.1437481407496
- 390
- 21.642670161812
- 391
- 17.8329722532425
- 392
- 14.4697975409392
- 393
- 21.1581449913954
- 394
- 22.2792032288299
- 395
- 20.2080956376374
- 396
- 20.9396336400311
- 397
- 25.3668636285433
- 398
- 25.9273927472605
- 399
- 29.195562524188
- 400
- 28.3975210670313
- 401
- 27.0674519717701
residuals(lm.fit)
- 1
- -5.82259509766824
- 2
- -4.27038978603508
- 3
- 3.97485801626161
- 4
- 1.63930422066539
- 5
- 6.70992217614699
- 6
- -0.904083746303968
- 7
- 0.155272587828725
- 8
- 10.7396042450824
- 9
- 10.3811362785936
- 10
- 0.592003069878544
- 11
- -0.125331595032187
- 12
- -3.04668595501456
- 13
- 2.07143446815494
- 14
- -6.3064332173421
- 15
- -6.60633450982611
- 16
- -6.60692285305292
- 17
- -5.20251613165552
- 18
- -3.11661685975337
- 19
- -3.24776393395219
- 20
- -5.63728416899296
- 21
- -0.983803463390132
- 22
- -1.81465831691008
- 23
- -1.56891697703351
- 24
- -1.16685972667424
- 25
- -3.46803641312785
- 26
- -4.96852604883867
- 27
- -3.88360995022725
- 28
- -3.33698804644502
- 29
- -3.99320915128082
- 30
- -2.17224962136237
- 31
- -0.382725484452505
- 32
- -7.6651973063789
- 33
- 4.97202671325083
- 34
- -4.02043523792397
- 35
- -1.72983702394557
- 36
- -6.45736313500575
- 37
- -3.71377775300442
- 38
- -5.22190804692552
- 39
- -0.229840925814653
- 40
- 0.350372328851421
- 41
- 2.22725684105772
- 42
- -3.35560200719442
- 43
- -3.73405413404917
- 44
- -2.78547368742365
- 45
- -4.28086955099429
- 46
- -5.55383697751401
- 47
- -1.11064252370753
- 48
- -0.0929130287328689
- 49
- 9.1171797099006
- 50
- 0.23695865149635
- 51
- -2.07567707133812
- 52
- -5.09487547344525
- 53
- -4.53758029154091
- 54
- -3.14492482720324
- 55
- -1.59311044376484
- 56
- 5.41589651219283
- 57
- -4.37205610819949
- 58
- 0.798854067960964
- 59
- -4.73650231260328
- 60
- -6.19438583773442
- 61
- -3.36069187746552
- 62
- -4.83512821111771
- 63
- -5.96000872859182
- 64
- -0.528372018682187
- 65
- 6.09405641836873
- 66
- -6.61711039733328
- 67
- -5.42533549690128
- 68
- -4.85844111394337
- 69
- -4.717694838691
- 70
- -5.30290705985036
- 71
- -3.9695092221294
- 72
- -3.46735326425415
- 73
- -6.50956844663899
- 74
- -3.99046875204785
- 75
- -4.01250626090392
- 76
- -4.66039965678666
- 77
- -3.18175011489995
- 78
- -3.99683401628853
- 79
- -1.63023185400949
- 80
- -5.60839176018538
- 81
- -1.52807979800333
- 82
- -3.79448454525041
- 83
- -3.3695092221294
- 84
- -4.51897023266059
- 85
- -1.51436609623123
- 86
- -1.75001859934342
- 87
- 0.163793809944662
- 88
- -4.33542433366566
- 89
- -5.72856943371415
- 90
- -0.438559562962553
- 91
- -3.5839060727752
- 92
- -4.76343617856757
- 93
- -3.9014381527179
- 94
- -3.65403439254598
- 95
- -3.89281822308598
- 96
- 0.163987323107539
- 97
- -2.38028120776749
- 98
- 8.14586689993804
- 99
- 12.6378353135329
- 100
- 4.52696462037887
- 101
- 1.89562403301717
- 102
- -0.76696233605931
- 103
- -5.85481624893565
- 104
- -2.4851775648757
- 105
- -2.73973234754707
- 106
- 0.593471977011009
- 107
- 2.67408006174101
- 108
- -0.76764548493301
- 109
- -3.09673530877255
- 110
- -0.380573428446343
- 111
- -0.50319928052922
- 112
- -2.10133944520191
- 113
- -0.353540854966067
- 114
- 0.382502576340963
- 115
- -6.1258251326121
- 116
- -1.28106306415716
- 117
- -1.91524666013689
- 118
- -5.56833253567579
- 119
- 0.448417687877217
- 120
- -2.32366917473684
- 121
- 1.09836833411923
- 122
- -0.69663660125657
- 123
- 2.98054403349768
- 124
- 6.88691319960745
- 125
- 0.948026759682378
- 126
- 0.916390049772743
- 127
- 7.04450450405973
- 128
- -2.02249248828324
- 129
- -1.93258132504762
- 130
- -2.82993573146155
- 131
- -3.38321902203242
- 132
- -3.30623580231013
- 133
- -0.989292065594243
- 134
- -1.8745990924005
- 135
- -2.50848656583228
- 136
- -0.341003839647572
- 137
- -1.09800680087306
- 138
- -3.59262080805401
- 139
- -0.998788657262733
- 140
- 0.784070190989413
- 141
- 2.39935150740996
- 142
- 12.5373573834294
- 143
- 4.32648278840622
- 144
- 6.14646304690302
- 145
- 5.07310469218845
- 146
- 5.65753115508905
- 147
- -3.13551913931255
- 148
- 8.10111653709037
- 149
- 10.1515568190432
- 150
- 1.2247177587258
- 151
- 0.341855008604571
- 152
- -2.33718546147698
- 153
- -7.73924271183625
- 154
- -0.152561583544427
- 155
- -3.18909465056228
- 156
- -4.68409958593808
- 157
- -6.12004430972913
- 158
- 11.1068856543661
- 159
- -4.14502353471922
- 160
- -4.23297615511155
- 161
- -2.32856943371415
- 162
- 17.0897445026182
- 163
- 17.2702538798322
- 164
- 18.6003229750934
- 165
- -0.795266401640087
- 166
- -0.23385671901721
- 167
- 18.9613417295215
- 168
- 0.77975827523891
- 169
- -0.208293052669401
- 170
- -1.49928219484264
- 171
- -3.44462870465529
- 172
- -4.02474715367447
- 173
- 2.50238412732179
- 174
- -2.36539472141086
- 175
- -2.79536510915607
- 176
- -0.0900778238530104
- 177
- -1.74884191288981
- 178
- -3.97803044424534
- 179
- 1.9205006486222
- 180
- 7.43440786355718
- 181
- 12.4285322350273
- 182
- 10.6241255136299
- 183
- 7.92539700573042
- 184
- 3.34243944996229
- 185
- 5.12784908615361
- 186
- 7.53930812253448
- 187
- 19.67387874484
- 188
- 3.79248880372028
- 189
- -0.421615826246661
- 190
- 5.46692513737247
- 191
- 7.29141082478265
- 192
- 0.401890589741878
- 193
- 4.57280076590233
- 194
- 1.3249073700196
- 195
- -1.2926247099231
- 196
- 18.2678057012781
- 197
- 2.6223604839495
- 198
- 3.9260840564732
- 199
- 6.3354858424948
- 200
- 4.67838417375334
- 201
- ⋯
- 202
- 4.9929784394311
- 203
- 0.800030754414571
- 204
- -7.44061681332182
- 205
- -4.78184882241593
- 206
- -6.44521704788209
- 207
- -6.77254574391031
- 208
- -4.01926245333945
- 209
- -5.44845098469497
- 210
- -1.93738287650895
- 211
- -7.42827331116621
- 212
- 0.660563775000873
- 213
- 0.389945819519275
- 214
- -1.61132957445031
- 215
- -1.45971260604388
- 216
- -3.91348553232557
- 217
- -4.9270018190657
- 218
- -6.83846085544657
- 219
- -4.90026146626429
- 220
- -3.7395388343842
- 221
- -5.1275901622925
- 222
- -5.71103735377146
- 223
- -0.202709644818399
- 224
- -5.78184882241593
- 225
- -4.98047862279945
- 226
- -6.11789225372296
- 227
- -5.64472741217127
- 228
- -7.71495443945803
- 229
- -6.95756055003771
- 230
- -7.44100774151666
- 231
- -5.84394555578159
- 232
- -5.74335721255479
- 233
- -6.02131970369872
- 234
- -5.8689208789026
- 235
- -6.30036017378027
- 236
- -7.02788238297137
- 237
- 3.36193007274827
- 238
- -9.83591396937648
- 239
- -3.83248651940073
- 240
- 1.02588664144124
- 241
- -7.04982118431146
- 242
- -5.31671556726936
- 243
- -5.41152698948228
- 244
- -4.36304525037273
- 245
- -2.35805018574853
- 246
- -5.97254574391032
- 247
- -5.23806992725173
- 248
- -8.55295641360835
- 249
- -0.17861878747214
- 250
- -8.70594358163127
- 251
- -8.66206597895109
- 252
- -0.0329722532424578
- 253
- -0.246685955014563
- 254
- -0.947274298241366
- 255
- 0.0832844327306455
- 256
- -2.15295641360835
- 257
- -1.17264054955721
- 258
- -4.07283796458917
- 259
- -3.84511834036611
- 260
- -7.62807979800333
- 261
- -0.289489480626205
- 262
- 0.646850073228771
- 263
- 1.21031700621092
- 264
- 18.5433200138679
- 265
- 18.9898432101342
- 266
- 18.2583052077405
- 267
- 24.5001294619305
- 268
- 23.8825973819879
- 269
- 12.2793751507822
- 270
- 15.3195330828078
- 271
- -6.7851775648757
- 272
- 1.42530610195261
- 273
- -1.07479260556337
- 274
- 1.05282831114371
- 275
- -3.66176595453406
- 276
- -7.80349150120808
- 277
- -3.62680050216465
- 278
- -0.832676130694513
- 279
- 1.07937124891316
- 280
- 3.34617082622416
- 281
- 1.9171797099006
- 282
- 2.81355484489288
- 283
- 3.23823794733503
- 284
- 4.73667033268658
- 285
- -3.24531185352899
- 286
- -3.19849643658388
- 287
- 6.46908499711681
- 288
- -0.456573474877894
- 289
- -6.34159218287438
- 290
- -6.32053394543995
- 291
- -5.18899594304629
- 292
- -3.65138489709082
- 293
- -7.12885775252392
- 294
- -0.49183114792616
- 295
- 0.219138252743885
- 296
- -3.52101967928168
- 297
- -8.04883801102073
- 298
- -3.15833850455831
- 299
- -7.47136515558762
- 300
- -0.0414895734893102
- 301
- -7.72170673002447
- 302
- -0.479688962671593
- 303
- 4.87025778170133
- 304
- 7.72746205982786
- 305
- 11.73813533795
- 306
- -9.94884191288981
- 307
- 2.80620640736146
- 308
- 15.999355409279
- 309
- 0.823150144077357
- 310
- 7.5789842225874
- 311
- 0.245092847286537
- 312
- -2.55206804596452
- 313
- 1.15547390472978
- 314
- -6.16382320489333
- 315
- -4.54971857492639
- 316
- -3.58409958593808
- 317
- -5.43806602538264
- 318
- -0.358144991395428
- 319
- 0.972808569640508
- 320
- -6.55099396889598
- 321
- -3.0821371412257
- 322
- -9.44756651892022
- 323
- -9.85912426281707
- 324
- -3.10877878651114
- 325
- -2.17665244089068
- 326
- -3.29497027909214
- 327
- -1.74736910388826
- 328
- -7.02474715367447
- 329
- -4.84404036142849
- 330
- -8.44159218287438
- 331
- 0.953807582565348
- 332
- -7.80545004405136
- 333
- -0.725035472484243
- 334
- 6.16683813546375
- 335
- -0.0167116654002687
- 336
- -3.04824966779392
- 337
- 1.09112250597288
- 338
- -0.392522100538032
- 339
- -1.24541056104497
- 340
- -1.1521667534805
- 341
- 0.0283426237335231
- 342
- -2.75246287602844
- 343
- -6.33502950360173
- 344
- -3.22944609575073
- 345
- -3.2083878583163
- 346
- -4.58498014984374
- 347
- -2.50946583725392
- 348
- -2.0464885399826
- 349
- -0.850014697474333
- 350
- -1.87841747057109
- 351
- -3.22944609575073
- 352
- -3.79340266444369
- 353
- -4.96000482672273
- 354
- -4.23453986789091
- 355
- -0.588115379140635
- 356
- -2.55403049067689
- 357
- -2.93561784682853
- 358
- -1.76265042030881
- 359
- -4.57783302921338
- 360
- -0.594188422702462
- 361
- -1.22964351078269
- 362
- 0.739505537566442
- 363
- 4.80121134273727
- 364
- 1.77055390424928
- 365
- -0.431112417915154
- 366
- 0.822463093334569
- 367
- -2.72670569651776
- 368
- 2.28886784058165
- 369
- 6.32373458543508
- 370
- -3.51994560221314
- 371
- 1.64234854618448
- 372
- -0.106918951183831
- 373
- 1.11188852272845
- 374
- -2.82445103112652
- 375
- -0.698693851615842
- 376
- -1.35031082002228
- 377
- -3.50045888129625
- 378
- -2.89399490953958
- 379
- -2.85432661322484
- 380
- -1.2801825002515
- 381
- -3.30231871662356
- 382
- -1.2221015600884
- 383
- -3.0757757788541
- 384
- -2.19594955051379
- 385
- -4.78115786980406
- 386
- 1.74362394015407
- 387
- -3.7864490569762
- 388
- -1.77068200671392
- 389
- -1.34374814074963
- 390
- 2.857329838188
- 391
- 5.26702774675754
- 392
- 5.23020245906083
- 393
- -2.85814499139543
- 394
- -1.07920322882986
- 395
- -2.70809563763744
- 396
- -4.13963364003109
- 397
- -2.96686362854333
- 398
- -5.32739274726054
- 399
- -5.29556252418803
- 400
- -6.39752106703132
- 401
- -15.1674519717701
rstudent(lm.fit)
- 1
- -0.938639094864759
- 2
- -0.687511237980931
- 3
- 0.640666347497302
- 4
- 0.264236181293794
- 5
- 1.08188171555607
- 6
- -0.145609733121283
- 7
- 0.024980406942384
- 8
- 1.73437879342761
- 9
- 1.68464086525454
- 10
- 0.0952794075069099
- 11
- -0.0201873355661056
- 12
- -0.490273791133087
- 13
- 0.333351306075818
- 14
- -1.0160056473743
- 15
- -1.06414618772446
- 16
- -1.0644855697941
- 17
- -0.838170499781684
- 18
- -0.501568661733041
- 19
- -0.522653922923515
- 20
- -0.907706171314735
- 21
- -0.158494972972672
- 22
- -0.291976161437516
- 23
- -0.252605464147521
- 24
- -0.187922873024383
- 25
- -0.558257512514491
- 26
- -0.800080140062439
- 27
- -0.625097216836159
- 28
- -0.537234871671181
- 29
- -0.642694395922744
- 30
- -0.349518620546228
- 31
- -0.0616922494431119
- 32
- -1.23505454754158
- 33
- 0.803970819135612
- 34
- -0.647488997056111
- 35
- -0.278639407928494
- 36
- -1.04016085356967
- 37
- -0.597705381713567
- 38
- -0.840941446675589
- 39
- -0.036981606345384
- 40
- 0.0564446368737533
- 41
- 0.359165483675643
- 42
- -0.540650973006434
- 43
- -0.601501580145319
- 44
- -0.448455773129773
- 45
- -0.689163599538104
- 46
- -0.894319282545907
- 47
- -0.178694441401907
- 48
- -0.0149589191291224
- 49
- 1.4794858705669
- 50
- 0.0381314696551221
- 51
- -0.333977592207178
- 52
- -0.820381277981058
- 53
- -0.731170741203512
- 54
- -0.506262368097963
- 55
- -0.256340498952637
- 56
- 0.873018618280602
- 57
- -0.704375891600158
- 58
- 0.1287121420649
- 59
- -0.762951011976889
- 60
- -0.997771831970615
- 61
- -0.540830339214282
- 62
- -0.778395675895519
- 63
- -0.960384952830883
- 64
- -0.0850219324765685
- 65
- 0.981759898912627
- 66
- -1.06709365156242
- 67
- -0.873592790591082
- 68
- -0.782420661418683
- 69
- -0.759424000566502
- 70
- -0.854001800681364
- 71
- -0.639314908989401
- 72
- -0.558086499234794
- 73
- -1.04944931507711
- 74
- -0.642579871579307
- 75
- -0.646236651447894
- 76
- -0.750389335714497
- 77
- -0.512020308202478
- 78
- -0.643349155389836
- 79
- -0.262291326007357
- 80
- -0.903235442588003
- 81
- -0.246113107598036
- 82
- -0.611036956159752
- 83
- -0.542619457064744
- 84
- -0.727772494046973
- 85
- -0.243690320960155
- 86
- -0.281772080751924
- 87
- 0.0263513078420176
- 88
- -0.698065093811022
- 89
- -0.923316630282838
- 90
- -0.0706225850927437
- 91
- -0.57693795832973
- 92
- -0.767089184178313
- 93
- -0.62815910651346
- 94
- -0.588541594911102
- 95
- -0.626575681977055
- 96
- 0.0264009337714837
- 97
- -0.383010062219545
- 98
- 1.31458665056151
- 99
- 2.04490270889228
- 100
- 0.729279176130483
- 101
- 0.305059628056721
- 102
- -0.123450978871037
- 103
- -0.942834136459934
- 104
- -0.399885545990373
- 105
- -0.440856181349915
- 106
- 0.0955061434621353
- 107
- 0.430589599898271
- 108
- -0.123506199698595
- 109
- -0.498329426694678
- 110
- -0.0612371578224353
- 111
- -0.0809557845134348
- 112
- -0.338144334982185
- 113
- -0.0568920983768528
- 114
- 0.0615611068291001
- 115
- -0.986573434264438
- 116
- -0.206145782948235
- 117
- -0.308157617977023
- 118
- -0.896649645697288
- 119
- 0.0721525005074296
- 120
- -0.373892198671318
- 121
- 0.176721919963297
- 122
- -0.112082505388073
- 123
- 0.479881886471635
- 124
- 1.11286365810549
- 125
- 0.15259484627943
- 126
- 0.147446031766052
- 127
- 1.13952674395143
- 128
- -0.325544731183282
- 129
- -0.310990306732437
- 130
- -0.45566286253898
- 131
- -0.544455135692867
- 132
- -0.532060886869863
- 133
- -0.159169315590762
- 134
- -0.301647303771584
- 135
- -0.403802958058385
- 136
- -0.0548809192591364
- 137
- -0.176715636642752
- 138
- -0.578217717691548
- 139
- -0.160925319075748
- 140
- 0.12622661479639
- 141
- 0.387065239338443
- 142
- 2.04429419156623
- 143
- 0.69912086424294
- 144
- 0.993487054197097
- 145
- 0.821145206970403
- 146
- 0.915038277113709
- 147
- -0.504729222338138
- 148
- 1.31283945757636
- 149
- 1.64545552575292
- 150
- 0.197338427163905
- 151
- 0.0550002888803532
- 152
- -0.376063834265179
- 153
- -1.24702508871445
- 154
- -0.0245489336227306
- 155
- -0.51325898051222
- 156
- -0.754089401399369
- 157
- -0.9857810285316
- 158
- 1.79486848319044
- 159
- -0.66765302450966
- 160
- -0.681686237418824
- 161
- -0.375047834366134
- 162
- 2.776857474875
- 163
- 2.80641553490693
- 164
- 3.02465385544699
- 165
- -0.127947675768017
- 166
- -0.0376290101600851
- 167
- 3.08403360632422
- 168
- 0.125450685410765
- 169
- -0.0335119482952013
- 170
- -0.241227905385803
- 171
- -0.554377710193746
- 172
- -0.647779179584251
- 173
- 0.402682775792712
- 174
- -0.380697943272136
- 175
- -0.449890315756566
- 176
- -0.0145069316695349
- 177
- -0.281412518706953
- 178
- -0.640754481247913
- 179
- 0.309198678276889
- 180
- 1.19911407015378
- 181
- 2.00852742128645
- 182
- 1.71454660209093
- 183
- 1.2786412968589
- 184
- 0.538398549975597
- 185
- 0.825559098495008
- 186
- 1.21471296328102
- 187
- 3.20142553205642
- 188
- 0.610788667777058
- 189
- -0.0679167677303175
- 190
- 0.881106085020471
- 191
- 1.17596476503487
- 192
- 0.0647366288625731
- 193
- 0.737447464660288
- 194
- 0.213403102901679
- 195
- -0.208244836401842
- 196
- 2.97001811452456
- 197
- 0.422566777182913
- 198
- 0.632083106135286
- 199
- 1.02103653551995
- 200
- 0.754048566254773
- 201
- ⋯
- 202
- 0.804388978350195
- 203
- 0.128777422611268
- 204
- -1.20030295890086
- 205
- -0.769868267606131
- 206
- -1.03802122837741
- 207
- -1.09180864406126
- 208
- -0.646901763556554
- 209
- -0.877607396632731
- 210
- -0.31178670729707
- 211
- -1.19679729115933
- 212
- 0.106339868681343
- 213
- 0.0627482035185011
- 214
- -0.259275723491083
- 215
- -0.234852531718946
- 216
- -0.63021760908267
- 217
- -0.793672372587439
- 218
- -1.10203002226353
- 219
- -0.788857951536903
- 220
- -0.602337010663921
- 221
- -0.826412851534179
- 222
- -0.920326588292935
- 223
- -0.0326121219947374
- 224
- -0.931119771731159
- 225
- -0.802216119638659
- 226
- -0.985444032272186
- 227
- -0.908874127026561
- 228
- -1.2436572447646
- 229
- -1.12179839961244
- 230
- -1.19963958253274
- 231
- -0.941401012325245
- 232
- -0.924927401325345
- 233
- -0.969702005023086
- 234
- -0.945350419750092
- 235
- -1.01481118715438
- 236
- -1.13234017445797
- 237
- 0.541568283829525
- 238
- -1.58685978034957
- 239
- -0.617166384035214
- 240
- 0.165257981314676
- 241
- -1.13573328346223
- 242
- -0.85597984281965
- 243
- -0.871940156041158
- 244
- -0.702881872669362
- 245
- -0.379757278916723
- 246
- -0.962586354237621
- 247
- -0.844144245158181
- 248
- -1.37924008883763
- 249
- -0.0287735776198971
- 250
- -1.40393891153959
- 251
- -1.39762753256173
- 252
- -0.0053071275653646
- 253
- -0.0396873725061245
- 254
- -0.152405779234805
- 255
- 0.0133988627945913
- 256
- -0.346570037550887
- 257
- -0.188670613248351
- 258
- -0.65559841350995
- 259
- -0.618888461862466
- 260
- -1.23035463813468
- 261
- -0.0466011426802584
- 262
- 0.104070505480749
- 263
- 0.19472575844683
- 264
- 3.01528395936497
- 265
- 3.08872424957903
- 266
- 2.96845786558646
- 267
- 4.00470254715351
- 268
- 3.90101959002739
- 269
- 2.00252183767457
- 270
- 2.51153744826928
- 271
- -1.0929128169075
- 272
- 0.229817986413809
- 273
- -0.173167152969462
- 274
- 0.16978747888872
- 275
- -0.590268911700114
- 276
- -1.25791280754907
- 277
- -0.584488453546384
- 278
- -0.134277350657285
- 279
- 0.174136220392089
- 280
- 0.541909292011534
- 281
- 0.310464422627529
- 282
- 0.454906151622062
- 283
- 0.524945339812025
- 284
- 0.76731884233673
- 285
- -0.522934029405484
- 286
- -0.514911063302858
- 287
- 1.04263188424416
- 288
- -0.073697979200531
- 289
- -1.02142339440823
- 290
- -1.01817018246058
- 291
- -0.835713290468071
- 292
- -0.588155892888511
- 293
- -1.14958310297145
- 294
- -0.0796266096622211
- 295
- 0.0354626565259144
- 296
- -0.568856133454207
- 297
- -1.29855744526259
- 298
- -0.50882724127053
- 299
- -1.20491999711317
- 300
- -0.00670321574465496
- 301
- -1.24678127191027
- 302
- -0.0773452117295426
- 303
- 0.784013399406782
- 304
- 1.24973132802751
- 305
- 1.89705886958375
- 306
- -1.60487152215782
- 307
- 0.452202403995784
- 308
- 2.61554152680479
- 309
- 0.132573648052529
- 310
- 1.23546787820642
- 311
- 0.0396386501267494
- 312
- -0.412033536057047
- 313
- 0.186611501515953
- 314
- -0.993841470310478
- 315
- -0.733807662066663
- 316
- -0.576866139413773
- 317
- -0.875704860059964
- 318
- -0.0576211647379782
- 319
- 0.156855620546657
- 320
- -1.05551241227763
- 321
- -0.497312911081858
- 322
- -1.52370853505663
- 323
- -1.59023312199085
- 324
- -0.501034325983144
- 325
- -0.351117397176642
- 326
- -0.530502227107625
- 327
- -0.281411371107686
- 328
- -1.13159260478024
- 329
- -0.779976965468237
- 330
- -1.36075464001696
- 331
- 0.153790479091726
- 332
- -1.25843504117571
- 333
- -0.117080203009323
- 334
- 1.00205503645968
- 335
- -0.00269407220158684
- 336
- -0.491377992269896
- 337
- 0.175707232492936
- 338
- -0.0631685425433482
- 339
- -0.20052042870578
- 340
- -0.185816694023616
- 341
- 0.00457120589849792
- 342
- -0.443132208090307
- 343
- -1.02052349759131
- 344
- -0.519998816277143
- 345
- -0.516750459377634
- 346
- -0.738363817634674
- 347
- -0.403992816743916
- 348
- -0.329412713277216
- 349
- -0.136797987662696
- 350
- -0.302444583880949
- 351
- -0.519998816277143
- 352
- -0.610992338801498
- 353
- -0.798760747322518
- 354
- -0.681740240956423
- 355
- -0.0946249869794545
- 356
- -0.411076797985518
- 357
- -0.472425760429892
- 358
- -0.283609020401765
- 359
- -0.736962684664491
- 360
- -0.0955949332581354
- 361
- -0.197842038256283
- 362
- 0.119020699139877
- 363
- 0.77401415024766
- 364
- 0.28503739418057
- 365
- -0.0693640027553692
- 366
- 0.132354829597394
- 367
- -0.438758741606478
- 368
- 0.368305255166795
- 369
- 1.01843496425756
- 370
- -0.566804306824535
- 371
- 0.264916833516439
- 372
- -0.0172133721367554
- 373
- 0.179412287645274
- 374
- -0.454748986254468
- 375
- -0.112408245269015
- 376
- -0.217264652947644
- 377
- -0.563598889403136
- 378
- -0.46597725787251
- 379
- -0.459347016517151
- 380
- -0.205967270780368
- 381
- -0.531473032865701
- 382
- -0.196640862747838
- 383
- -0.494967482280201
- 384
- -0.35353098051867
- 385
- -0.771576050351409
- 386
- 0.28213345512461
- 387
- -0.609739941079116
- 388
- -0.28489428822031
- 389
- -0.216194964942242
- 390
- 0.459793755622783
- 391
- 0.84837328753545
- 392
- 0.843215665546847
- 393
- -0.459935866439256
- 394
- -0.173628446760455
- 395
- -0.435813419050483
- 396
- -0.666317591168703
- 397
- -0.4775020260264
- 398
- -0.857914768513885
- 399
- -0.853387417647027
- 400
- -1.03108965348273
- 401
- -2.45582479352552
plot(predict(lm.fit), residuals(lm.fit))
plot(predict(lm.fit), rstudent(lm.fit))
根据残差图,有一些非线性的证据。可以使用hatvalues()函数为任意数量的预测变量计算统计信息
plot(hatvalues(lm.fit))
which.max(hatvalues(lm.fit))
Multiple Linear Regression 多元线性回归
为了使用最小二乘法拟合多元线性回归模型,我们再次使用lm()函数。语法lm(y ~ x1 + x2 + x3)用于拟合一个具有三个预测因子x1、x2和x3的模型。summary()函数现在输出所有预测变量的回归系数。
lm.fit <- lm(medv ~ lstat + age, data = Boston)
summary(lm.fit)
Call:
lm(formula = medv ~ lstat + age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.981 -3.978 -1.283 1.968 23.158
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.22276 0.73085 45.458 < 2e-16 ***
lstat -1.03207 0.04819 -21.416 < 2e-16 ***
age 0.03454 0.01223 2.826 0.00491 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.173 on 503 degrees of freedom
Multiple R-squared: 0.5513, Adjusted R-squared: 0.5495
F-statistic: 309 on 2 and 503 DF, p-value: < 2.2e-16
# 对所有的12个变量进行的多元线性回归
lm.fit <- lm(medv ~ ., data = Boston)
summary(lm.fit)
Call:
lm(formula = medv ~ ., data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.1304 -2.7673 -0.5814 1.9414 26.2526
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.617270 4.936039 8.431 3.79e-16 ***
crim -0.121389 0.033000 -3.678 0.000261 ***
zn 0.046963 0.013879 3.384 0.000772 ***
indus 0.013468 0.062145 0.217 0.828520
chas 2.839993 0.870007 3.264 0.001173 **
nox -18.758022 3.851355 -4.870 1.50e-06 ***
rm 3.658119 0.420246 8.705 < 2e-16 ***
age 0.003611 0.013329 0.271 0.786595
dis -1.490754 0.201623 -7.394 6.17e-13 ***
rad 0.289405 0.066908 4.325 1.84e-05 ***
tax -0.012682 0.003801 -3.337 0.000912 ***
ptratio -0.937533 0.132206 -7.091 4.63e-12 ***
lstat -0.552019 0.050659 -10.897 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.798 on 493 degrees of freedom
Multiple R-squared: 0.7343, Adjusted R-squared: 0.7278
F-statistic: 113.5 on 12 and 493 DF, p-value: < 2.2e-16
?summary.lm # 查看summary对象中的参数
summary.lm package:stats R Documentation
_S_u_m_m_a_r_i_z_i_n_g _L_i_n_e_a_r _M_o_d_e_l _F_i_t_s
_D_e_s_c_r_i_p_t_i_o_n:
‘summary’ method for class ‘"lm"’.
_U_s_a_g_e:
## S3 method for class 'lm'
summary(object, correlation = FALSE, symbolic.cor = FALSE, ...)
## S3 method for class 'summary.lm'
print(x, digits = max(3, getOption("digits") - 3),
symbolic.cor = x$symbolic.cor,
signif.stars = getOption("show.signif.stars"), ...)
_A_r_g_u_m_e_n_t_s:
object: an object of class ‘"lm"’, usually, a result of a call to
‘lm’.
x: an object of class ‘"summary.lm"’, usually, a result of a
call to ‘summary.lm’.
correlation: logical; if ‘TRUE’, the correlation matrix of the
estimated parameters is returned and printed.
digits: the number of significant digits to use when printing.
symbolic.cor: logical. If ‘TRUE’, print the correlations in a symbolic
form (see ‘symnum’) rather than as numbers.
signif.stars: logical. If ‘TRUE’, ‘significance stars’ are printed for
each coefficient.
...: further arguments passed to or from other methods.
_D_e_t_a_i_l_s:
‘print.summary.lm’ tries to be smart about formatting the
coefficients, standard errors, etc. and additionally gives
‘significance stars’ if ‘signif.stars’ is ‘TRUE’.
Aliased coefficients are omitted in the returned object but
restored by the ‘print’ method.
Correlations are printed to two decimal places (or symbolically):
to see the actual correlations print ‘summary(object)$correlation’
directly.
_V_a_l_u_e:
The function ‘summary.lm’ computes and returns a list of summary
statistics of the fitted linear model given in ‘object’, using the
components (list elements) ‘"call"’ and ‘"terms"’ from its
argument, plus
residuals: the _weighted_ residuals, the usual residuals rescaled by
the square root of the weights specified in the call to ‘lm’.
coefficients: a p x 4 matrix with columns for the estimated
coefficient, its standard error, t-statistic and
corresponding (two-sided) p-value. Aliased coefficients are
omitted.
aliased: named logical vector showing if the original coefficients are
aliased.
sigma: the square root of the estimated variance of the random error
sigma^2 = 1/(n-p) Sum(w[i] R[i]^2),
where R[i] is the i-th residual, ‘residuals[i]’.
df: degrees of freedom, a 3-vector (p, n-p, p*), the first being
the number of non-aliased coefficients, the last being the
total number of coefficients.
fstatistic: (for models including non-intercept terms) a 3-vector with
the value of the F-statistic with its numerator and
denominator degrees of freedom.
r.squared: R^2, the ‘fraction of variance explained by the model’,
R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
where y* is the mean of y[i] if there is an intercept and
zero otherwise.
adj.r.squared: the above R^2 statistic ‘_adjusted_’, penalizing for
higher p.
cov.unscaled: a p x p matrix of (unscaled) covariances of the coef[j],
j=1, ..., p.
correlation: the correlation matrix corresponding to the above
‘cov.unscaled’, if ‘correlation = TRUE’ is specified.
symbolic.cor: (only if ‘correlation’ is true.) The value of the
argument ‘symbolic.cor’.
na.action: from ‘object’, if present there.
_S_e_e _A_l_s_o:
The model fitting function ‘lm’, ‘summary’.
Function ‘coef’ will extract the matrix of coefficients with
standard errors, t-statistics and p-values.
_E_x_a_m_p_l_e_s:
##-- Continuing the lm(.) example:
coef(lm.D90) # the bare coefficients
sld90 <- summary(lm.D90 <- lm(weight ~ group -1)) # omitting intercept
sld90
coef(sld90) # much more
## model with *aliased* coefficient:
lm.D9. <- lm(weight ~ group + I(group != "Ctl"))
Sm.D9. <- summary(lm.D9.)
Sm.D9. # shows the NA NA NA NA line
stopifnot(length(cc <- coef(lm.D9.)) == 3, is.na(cc[3]),
dim(coef(Sm.D9.)) == c(2,4), Sm.D9.$df == c(2, 18, 3))
summary(object = lm.fit)
Call:
lm(formula = medv ~ ., data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.1304 -2.7673 -0.5814 1.9414 26.2526
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.617270 4.936039 8.431 3.79e-16 ***
crim -0.121389 0.033000 -3.678 0.000261 ***
zn 0.046963 0.013879 3.384 0.000772 ***
indus 0.013468 0.062145 0.217 0.828520
chas 2.839993 0.870007 3.264 0.001173 **
nox -18.758022 3.851355 -4.870 1.50e-06 ***
rm 3.658119 0.420246 8.705 < 2e-16 ***
age 0.003611 0.013329 0.271 0.786595
dis -1.490754 0.201623 -7.394 6.17e-13 ***
rad 0.289405 0.066908 4.325 1.84e-05 ***
tax -0.012682 0.003801 -3.337 0.000912 ***
ptratio -0.937533 0.132206 -7.091 4.63e-12 ***
lstat -0.552019 0.050659 -10.897 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.798 on 493 degrees of freedom
Multiple R-squared: 0.7343, Adjusted R-squared: 0.7278
F-statistic: 113.5 on 12 and 493 DF, p-value: < 2.2e-16
summary(lm.fit)$r.sq # 模型的R方,未矫正前的
summary(lm.fit)$sigma # 模型的RSE,残差标准误差
# car包中的vif函数,可以用于计算方差膨胀因子,variance infation factors
# install.packages("car")
library(car)
vif(lm.fit)
# 使用vif函数取不同自变量的方差膨胀因子,并进行开根号。如果大于2,代表自变量与其他变量存在共线性,此时需要对模型进行优化
Updating HTML index of packages in '.Library' Making 'packages.html' ... done Loading required package: carData
- crim
- 1.76748591543102
- zn
- 2.29845890773582
- indus
- 3.9871806307571
- chas
- 1.07116777375841
- nox
- 4.3690926228448
- rm
- 1.91253243743689
- age
- 3.0882320397312
- dis
- 3.95403664162831
- rad
- 7.44530076006986
- tax
- 9.00215766347181
- ptratio
- 1.79705959312978
- lstat
- 2.87077650084175
print(vif(lm.fit))
crim zn indus chas nox rm age dis
1.767486 2.298459 3.987181 1.071168 4.369093 1.912532 3.088232 3.954037
rad tax ptratio lstat
7.445301 9.002158 1.797060 2.870777
如果我们想使用除一个变量之外的所有变量执行回归呢?例如,在上述回归输出中,年龄具有较高的p值。因此,我们可能希望运行一个回归排除这个预测因素。下面的语法将使用除年龄之外的所有预测因子进行回归。
lm.fit1 <- lm(medv ~ . - age, data = Boston)
summary(lm.fit1)
Call:
lm(formula = medv ~ . - age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.1851 -2.7330 -0.6116 1.8555 26.3838
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.525128 4.919684 8.441 3.52e-16 ***
crim -0.121426 0.032969 -3.683 0.000256 ***
zn 0.046512 0.013766 3.379 0.000785 ***
indus 0.013451 0.062086 0.217 0.828577
chas 2.852773 0.867912 3.287 0.001085 **
nox -18.485070 3.713714 -4.978 8.91e-07 ***
rm 3.681070 0.411230 8.951 < 2e-16 ***
dis -1.506777 0.192570 -7.825 3.12e-14 ***
rad 0.287940 0.066627 4.322 1.87e-05 ***
tax -0.012653 0.003796 -3.333 0.000923 ***
ptratio -0.934649 0.131653 -7.099 4.39e-12 ***
lstat -0.547409 0.047669 -11.483 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.794 on 494 degrees of freedom
Multiple R-squared: 0.7343, Adjusted R-squared: 0.7284
F-statistic: 124.1 on 11 and 494 DF, p-value: < 2.2e-16
lm.fit2 <- lm(medv ~ . - age - indus, data = Boston)
summary(lm.fit2)
Call:
lm(formula = medv ~ . - age - indus, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.1814 -2.7625 -0.6243 1.8448 26.3920
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.451747 4.903283 8.454 3.18e-16 ***
crim -0.121665 0.032919 -3.696 0.000244 ***
zn 0.046191 0.013673 3.378 0.000787 ***
chas 2.871873 0.862591 3.329 0.000935 ***
nox -18.262427 3.565247 -5.122 4.33e-07 ***
rm 3.672957 0.409127 8.978 < 2e-16 ***
dis -1.515951 0.187675 -8.078 5.08e-15 ***
rad 0.283932 0.063945 4.440 1.11e-05 ***
tax -0.012292 0.003407 -3.608 0.000340 ***
ptratio -0.930961 0.130423 -7.138 3.39e-12 ***
lstat -0.546509 0.047442 -11.519 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.789 on 495 degrees of freedom
Multiple R-squared: 0.7342, Adjusted R-squared: 0.7289
F-statistic: 136.8 on 10 and 495 DF, p-value: < 2.2e-16
# 或者,也可以使用update函数 # update and (by default) re-fit a model
lm.fit3 <- update(lm.fit, ~ . - age)
summary(lm.fit3)
Call:
lm(formula = medv ~ crim + zn + indus + chas + nox + rm + dis +
rad + tax + ptratio + lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.1851 -2.7330 -0.6116 1.8555 26.3838
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.525128 4.919684 8.441 3.52e-16 ***
crim -0.121426 0.032969 -3.683 0.000256 ***
zn 0.046512 0.013766 3.379 0.000785 ***
indus 0.013451 0.062086 0.217 0.828577
chas 2.852773 0.867912 3.287 0.001085 **
nox -18.485070 3.713714 -4.978 8.91e-07 ***
rm 3.681070 0.411230 8.951 < 2e-16 ***
dis -1.506777 0.192570 -7.825 3.12e-14 ***
rad 0.287940 0.066627 4.322 1.87e-05 ***
tax -0.012653 0.003796 -3.333 0.000923 ***
ptratio -0.934649 0.131653 -7.099 4.39e-12 ***
lstat -0.547409 0.047669 -11.483 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.794 on 494 degrees of freedom
Multiple R-squared: 0.7343, Adjusted R-squared: 0.7284
F-statistic: 124.1 on 11 and 494 DF, p-value: < 2.2e-16
使用lm()函数可以很容易地在线性模型中包括交互项。
语法lstat:age告诉R在lstat和age之间包含一个交互项。
语法lstat * age同时包括lstat、age和交互项lstat×age作为预测符;它是lstat + age + lstat:age的缩写。
多个变量之间的交互作用:https://zh.wikipedia.org/wiki/%E4%BA%A4%E4%BA%92%E4%BD%9C%E7%94%A8_(%E7%BB%9F%E8%AE%A1%E5%AD%A6)
其中一个原因变量对结果的影响依赖于第二个原因变量的状态(即两者的影响原因不是简单累加的)
age (proportion of owner-occupied units built prior to 1940), and lstat (percent of households with low socioeconomic status).
age(1940年之前建造的自住单元的比例)和lstat(社会经济地位较低的家庭的百分比)
来预测medv房假中位数
summary(lm(medv ~ lstat * age, data = Boston))
Call:
lm(formula = medv ~ lstat * age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.806 -4.045 -1.333 2.085 27.552
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.0885359 1.4698355 24.553 < 2e-16 ***
lstat -1.3921168 0.1674555 -8.313 8.78e-16 ***
age -0.0007209 0.0198792 -0.036 0.9711
lstat:age 0.0041560 0.0018518 2.244 0.0252 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.149 on 502 degrees of freedom
Multiple R-squared: 0.5557, Adjusted R-squared: 0.5531
F-statistic: 209.3 on 3 and 502 DF, p-value: < 2.2e-16
解读参考如下:
Non-linear Transformations of the Predictors 预测变量的非线性转换
lm()函数还可以适应预测变量的非线性转换。
例如,给定预测器变量X,我们可以在公式对象中创建预测变量X2;这样的包装允许I(X^2)的标准用法。
函数I()是必需的,因为^有一个特殊的意义R,即提高X的2次方。现在我们将medv回归到lstat和lstat2。
lm.fit2 <- lm(medv ~ lstat + I(lstat^2),data = Boston) # 不能写成medv ~ lstat + lstat^2
summary(lm.fit2)
Call:
lm(formula = medv ~ lstat + I(lstat^2), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.2834 -3.8313 -0.5295 2.3095 25.4148
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.862007 0.872084 49.15 <2e-16 ***
lstat -2.332821 0.123803 -18.84 <2e-16 ***
I(lstat^2) 0.043547 0.003745 11.63 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.524 on 503 degrees of freedom
Multiple R-squared: 0.6407, Adjusted R-squared: 0.6393
F-statistic: 448.5 on 2 and 503 DF, p-value: < 2.2e-16
与二次项相关的接近零的p值表明它导致了一个改进的模型。我们使用anova()函数进一步量化二次拟合优于线性拟合的程度
线性拟合:
二次项拟合(非线性拟合):
lm.fit <- lm(medv ~ lstat,data = Boston)
summary(lm.fit)
anova(lm.fit, lm.fit2)
Call:
lm(formula = medv ~ lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.168 -3.990 -1.318 2.034 24.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.55384 0.56263 61.41 <2e-16 ***
lstat -0.95005 0.03873 -24.53 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared: 0.5441, Adjusted R-squared: 0.5432
F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 504 | 19472.38 | NA | NA | NA | NA |
| 2 | 504 | 19472.38 | 0 | 0 | NA | NA |
lm.fit <- lm(medv ~ lstat,data = Boston)
lm.fit2 <- lm(medv ~ lstat + I(lstat^2),data = Boston)
anova(lm.fit, lm.fit2)
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 504 | 19472.38 | NA | NA | NA | NA |
| 2 | 503 | 15347.24 | 1 | 4125.138 | 135.1998 | 7.630116e-28 |
这里模型1表示线性子模型,只包含一个预测因子lstat,而模型2对应的是较大的二次模型,它有两个预测因子lstat和lstat2。
anova()函数执行一个比较两个模型的假设检验。零假设是两个模型都能很好地拟合数据,备择假设是完整模型更优。这里的f统计量是135,相关的p值几乎为零。
这提供了非常明确的证据,证明包含预测器lstat和lstat2的模型比只包含预测器lstat的模型要好得多。这并不奇怪,因为我们之前看到 medv和lstat之间非线性关系的证据
par(mfrow = c(2, 2))
plot(lm.fit2)
然后我们看到,当lstat2项包含在模型中时,在残差中几乎没有可识别的模式。
为了创建一个三次拟合,我们可以包括一个I(X^3)形式的预测器。
然而,对于高阶多项式,这种方法开始变得繁琐。更好的方法是在lm()中使用来创建多项式。
例如,下面的命令poly()函数poly()产生一个五阶多项式拟合:
lm.fit5 <- lm(medv ~ poly(lstat, 5),data = Boston)
summary(lm.fit5)
Call:
lm(formula = medv ~ poly(lstat, 5), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.5433 -3.1039 -0.7052 2.0844 27.1153
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.5328 0.2318 97.197 < 2e-16 ***
poly(lstat, 5)1 -152.4595 5.2148 -29.236 < 2e-16 ***
poly(lstat, 5)2 64.2272 5.2148 12.316 < 2e-16 ***
poly(lstat, 5)3 -27.0511 5.2148 -5.187 3.10e-07 ***
poly(lstat, 5)4 25.4517 5.2148 4.881 1.42e-06 ***
poly(lstat, 5)5 -19.2524 5.2148 -3.692 0.000247 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.215 on 500 degrees of freedom
Multiple R-squared: 0.6817, Adjusted R-squared: 0.6785
F-statistic: 214.2 on 5 and 500 DF, p-value: < 2.2e-16
这表明,包括额外的多项式项,最多五阶,导致模型拟合的改进!然而,数据的进一步调查表明,在回归拟合中,没有超过五阶的多项式项有显著的p值
默认情况下,poly()函数将预测函数正交:这意味着该函数输出的特征不只是参数的幂序列。
然而,应用于poly()函数输出的线性模型将具有与应用于原始多项式的线性模型相同的拟合值(尽管系数估计值、标准误和p值会有所不同)。
为了从poly()函数中获得原始的多项式,必须使用参数raw = TRUE。
# 当然,我们并不局限于使用多项式变换来进行预测。这里我们尝试一个对数变换
summary(lm(medv ~ log(lstat), data = Boston))
summary(lm(medv ~ log(rm), data = Boston))
Call:
lm(formula = medv ~ log(lstat), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-14.4599 -3.5006 -0.6686 2.1688 26.0129
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.1248 0.9652 54.00 <2e-16 ***
log(lstat) -12.4810 0.3946 -31.63 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.329 on 504 degrees of freedom
Multiple R-squared: 0.6649, Adjusted R-squared: 0.6643
F-statistic: 1000 on 1 and 504 DF, p-value: < 2.2e-16
Call:
lm(formula = medv ~ log(rm), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-19.487 -2.875 -0.104 2.837 39.816
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -76.488 5.028 -15.21 <2e-16 ***
log(rm) 54.055 2.739 19.73 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.915 on 504 degrees of freedom
Multiple R-squared: 0.4358, Adjusted R-squared: 0.4347
F-statistic: 389.3 on 1 and 504 DF, p-value: < 2.2e-16
# 生成正交多项式特征
x <- 1:10
poly_features <- poly(x, degree = 3)
print(poly_features)
# 生成原始多项式特征
raw_poly_features <- poly(x, degree = 3, raw = TRUE)
print(raw_poly_features)
1 2 3
[1,] -0.49543369 0.52223297 -0.4534252
[2,] -0.38533732 0.17407766 0.1511417
[3,] -0.27524094 -0.08703883 0.3778543
[4,] -0.16514456 -0.26111648 0.3346710
[5,] -0.05504819 -0.34815531 0.1295501
[6,] 0.05504819 -0.34815531 -0.1295501
[7,] 0.16514456 -0.26111648 -0.3346710
[8,] 0.27524094 -0.08703883 -0.3778543
[9,] 0.38533732 0.17407766 -0.1511417
[10,] 0.49543369 0.52223297 0.4534252
attr(,"coefs")
attr(,"coefs")$alpha
[1] 5.5 5.5 5.5
attr(,"coefs")$norm2
[1] 1.0 10.0 82.5 528.0 3088.8
attr(,"degree")
[1] 1 2 3
attr(,"class")
[1] "poly" "matrix"
1 2 3
[1,] 1 1 1
[2,] 2 4 8
[3,] 3 9 27
[4,] 4 16 64
[5,] 5 25 125
[6,] 6 36 216
[7,] 7 49 343
[8,] 8 64 512
[9,] 9 81 729
[10,] 10 100 1000
attr(,"degree")
[1] 1 2 3
attr(,"class")
[1] "poly" "matrix"
Qualitative Predictors 定性预测
# 现在使用的是其他的数据,ISLR2库中的Carseats数据集
# 我们将尝试基于一些预测因素来预测400个地点的销售Sales(儿童汽车座椅的销售)
head(Carseats)
| Sales | CompPrice | Income | Advertising | Population | Price | ShelveLoc | Age | Education | Urban | US | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <fct> | <dbl> | <dbl> | <fct> | <fct> | |
| 1 | 9.50 | 138 | 73 | 11 | 276 | 120 | Bad | 42 | 17 | Yes | Yes |
| 2 | 11.22 | 111 | 48 | 16 | 260 | 83 | Good | 65 | 10 | Yes | Yes |
| 3 | 10.06 | 113 | 35 | 10 | 269 | 80 | Medium | 59 | 12 | Yes | Yes |
| 4 | 7.40 | 117 | 100 | 4 | 466 | 97 | Medium | 55 | 14 | Yes | Yes |
| 5 | 4.15 | 141 | 64 | 3 | 340 | 128 | Bad | 38 | 13 | Yes | No |
| 6 | 10.81 | 124 | 113 | 13 | 501 | 72 | Bad | 78 | 16 | No | Yes |
我们可以看到这些预测变量中,除了定量的变量之外,还包括了定性的变量,比如说是Shelveloc,它是货架位置质量的一个指标,也就是商店中展示汽车座椅的空间;
然后这一个变量的取值总共有3个可能的值:差,中,好。
给定一个定性变量,如Shelveloc, R会自动生成哑变量。下面我们给出一个包含一些交互项的多元回归模型。
lm.fit <- lm(Sales ~ . + Income:Advertising + Price:Age, data = Carseats) # Income:Advertising表示Income和Advertising的交互项,以及原始数据中的定性项:ShelveLoc
summary(lm.fit)
Call:
lm(formula = Sales ~ . + Income:Advertising + Price:Age, data = Carseats)
Residuals:
Min 1Q Median 3Q Max
-2.9208 -0.7503 0.0177 0.6754 3.3413
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5755654 1.0087470 6.519 2.22e-10 ***
CompPrice 0.0929371 0.0041183 22.567 < 2e-16 ***
Income 0.0108940 0.0026044 4.183 3.57e-05 ***
Advertising 0.0702462 0.0226091 3.107 0.002030 **
Population 0.0001592 0.0003679 0.433 0.665330
Price -0.1008064 0.0074399 -13.549 < 2e-16 ***
ShelveLocGood 4.8486762 0.1528378 31.724 < 2e-16 ***
ShelveLocMedium 1.9532620 0.1257682 15.531 < 2e-16 ***
Age -0.0579466 0.0159506 -3.633 0.000318 ***
Education -0.0208525 0.0196131 -1.063 0.288361
UrbanYes 0.1401597 0.1124019 1.247 0.213171
USYes -0.1575571 0.1489234 -1.058 0.290729
Income:Advertising 0.0007510 0.0002784 2.698 0.007290 **
Price:Age 0.0001068 0.0001333 0.801 0.423812
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.011 on 386 degrees of freedom
Multiple R-squared: 0.8761, Adjusted R-squared: 0.8719
F-statistic: 210 on 13 and 386 DF, p-value: < 2.2e-16
contrasts()函数的作用是:返回R使用的虚拟变量的编码。
attach(Carseats)
contrasts(ShelveLoc)
| Good | Medium | |
|---|---|---|
| Bad | 0 | 0 |
| Good | 1 | 0 |
| Medium | 0 | 1 |
R创建了一个ShelveLocGood虚拟变量,如果放置位置良好,该变量的值为1,否则为0。
它还创建了一个ShelveLocMedium虚拟变量,如果搁置位置是中等,则该变量等于1,否则等于0。
一个糟糕的搁置位置对应于两个虚拟变量的零。
事实上,ShelveLocGood在回归输出中的系数是正的,这表明一个好的货架位置与高销量相关(相对于一个不好的位置)。而ShelveLocMedium的正系数较小,表明中等货架位置的销量高于差货架位置,但低于好货架位置
Writing Function 自己手写函数
LoadLibraries <- function(){
library(ISLR2)
library(MASS)
print("the libraries ISLR2 & MASS have been loaded")
}
LoadLibraries #该函数的信息
LoadLibraries() #该函数的执行
function ()
{
library(ISLR2)
library(MASS)
print("the libraries ISLR2 & MASS have been loaded")
}
[1] "the libraries ISLR2 & MASS have been loaded"
二,练习部分
Section 3.7 Exercises 3, 4, 9, 13
3,
$y = \beta_0 + \beta_1 \cdot \text{GPA} + \beta_2 \cdot \text{IQ} + \beta_3 \cdot \text{Level} + \beta_4 \cdot \text{GPA} \cdot \text{IQ} + \beta_5 \cdot \text{GPA} \cdot \text{Level}$
$$ y = \beta_0 + \beta_1 \cdot \text{GPA} + \beta_2 \cdot \text{IQ} + \beta_3 \cdot \text{Level} + \beta_4 \cdot \text{GPA} \cdot \text{IQ} + \beta_5 \cdot \text{GPA} \cdot \text{Level} $$
其中,系数估计值为:
$$ \begin{align*} \hat{\beta}_0 &= 50 \\ \hat{\beta}_1 &= 20 \\ \hat{\beta}_2 &= 0.07 \\ \hat{\beta}_3 &= 35 \\ \hat{\beta}_4 &= 0.01 \\ \hat{\beta}_5 &= -10 \\ \end{align*} $$
假设我们有一个包含五个预测变量的数据集,分别是:X1 = GPA, X2 = IQ, X3 = Level(1 表示 College,0 表示 High School),X4 = GPA 和 IQ 的交互项,X5 = GPA 和 Level 的交互项。响应变量是毕业后的起始薪水(以千美元为单位)。假设我们使用最小二乘法拟合模型,并得到以下系数估计值:
$$ y = 50 + 20 \cdot \text{GPA} + 0.07 \cdot \text{IQ} + 35 \cdot \text{Level} + 0.01 \cdot \text{GPA} \cdot \text{IQ} - 10 \cdot \text{GPA} \cdot \text{Level} $$
a
如果固定IQ和GPA,然后我们只考虑level的话,比如说从高中生变到大学生,也就是level的值从0变到1的话,那么对于salary的影响就是:
$$ y1 = 50 + 20 \cdot \text{GPA} + 0.07 \cdot \text{IQ} + 0.01 \cdot \text{GPA} \cdot \text{IQ} $$
$$ y2 = 50 + 20 \cdot \text{GPA} + 0.07 \cdot \text{IQ} + 35 + 0.01 \cdot \text{GPA} \cdot \text{IQ} - 10 \cdot \text{GPA} $$
$\Delta y = y2 - y1 = 35 - 10 \cdot \text{GPA}$
$\Delta y > 0 \Rightarrow \text{GPA} < \dfrac{35}{10} = 3.5$
所以意思就是如果固定IQ的话,只有GPA低于一定程度,大学生的初始薪水才会比高中生多,也就是只有GPA高于一定程度,高中生的初始薪水水平会比高中生多
?outer
outer package:base R Documentation
_O_u_t_e_r _P_r_o_d_u_c_t _o_f _A_r_r_a_y_s
_D_e_s_c_r_i_p_t_i_o_n:
The outer product of the arrays ‘X’ and ‘Y’ is the array ‘A’ with
dimension ‘c(dim(X), dim(Y))’ where element ‘A[c(arrayindex.x,
arrayindex.y)] = FUN(X[arrayindex.x], Y[arrayindex.y], ...)’.
_U_s_a_g_e:
outer(X, Y, FUN = "*", ...)
X %o% Y
_A_r_g_u_m_e_n_t_s:
X, Y: first and second arguments for function ‘FUN’. Typically a
vector or array.
FUN: a function to use on the outer products, found _via_
‘match.fun’ (except for the special case ‘"*"’).
...: optional arguments to be passed to ‘FUN’.
_D_e_t_a_i_l_s:
‘X’ and ‘Y’ must be suitable arguments for ‘FUN’. Each will be
extended by ‘rep’ to length the products of the lengths of ‘X’ and
‘Y’ before ‘FUN’ is called.
‘FUN’ is called with these two extended vectors as arguments (plus
any arguments in ‘...’). It must be a vectorized function (or the
name of one) expecting at least two arguments and returning a
value with the same length as the first (and the second).
Where they exist, the [dim]names of ‘X’ and ‘Y’ will be copied to
the answer, and a dimension assigned which is the concatenation of
the dimensions of ‘X’ and ‘Y’ (or lengths if dimensions do not
exist).
‘FUN = "*"’ is handled as a special case _via_ ‘as.vector(X) %*%
t(as.vector(Y))’, and is intended only for numeric vectors and
arrays.
‘%o%’ is binary operator providing a wrapper for ‘outer(x, y,
"*")’.
_A_u_t_h_o_r(_s):
Jonathan Rougier
_R_e_f_e_r_e_n_c_e_s:
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
Language_. Wadsworth & Brooks/Cole.
_S_e_e _A_l_s_o:
‘%*%’ for usual (_inner_) matrix vector multiplication;
‘kronecker’ which is based on ‘outer’; ‘Vectorize’ for vectorizing
a non-vectorized function.
_E_x_a_m_p_l_e_s:
x <- 1:9; names(x) <- x
# Multiplication & Power Tables
x %o% x
y <- 2:8; names(y) <- paste(y,":", sep = "")
outer(y, x, `^`)
outer(month.abb, 1999:2003, FUN = paste)
## three way multiplication table:
x %o% x %o% y[1:3]
# a
# 固定IQ和GPA的取值,然后考虑预测变量的取值对于响应变量的影响
library(plotly) #用于构建交互式的图表
salary_model <- function(GPA,IQ,LEVEL){
50+20*GPA+0.07*IQ+35*LEVEL+0.01*GPA*IQ-10*GPA*LEVEL
}
gpa_test <- seq(1,5,length=10) # 人工构建从1到5,长度为10的等间隔的IQ值
iq_test <- seq(1,200,length=20) # 人工构建从1到200,长度为20的等间隔的GPA值
print(gpa_test)
print(iq_test)
[1] 1.000000 1.444444 1.888889 2.333333 2.777778 3.222222 3.666667 4.111111 [9] 4.555556 5.000000 [1] 1.00000 23.11111 45.22222 67.33333 89.44444 111.55556 133.66667 [8] 155.77778 177.88889 200.00000
outer(gpa_test,iq_test,LEVEL=1,FUN=salary_model)
t(outer(gpa_test,iq_test,LEVEL=1,FUN=salary_model))
| 95.08000 | 95.91789 | 96.75579 | 97.59368 | 98.43158 | 99.26947 | 100.1074 | 100.9453 | 101.7832 | 102.6211 | 103.4589 | 104.2968 | 105.1347 | 105.9726 | 106.8105 | 107.6484 | 108.4863 | 109.3242 | 110.1621 | 111.0000 |
| 99.52889 | 100.41333 | 101.29778 | 102.18222 | 103.06667 | 103.95111 | 104.8356 | 105.7200 | 106.6044 | 107.4889 | 108.3733 | 109.2578 | 110.1422 | 111.0267 | 111.9111 | 112.7956 | 113.6800 | 114.5644 | 115.4489 | 116.3333 |
| 103.97778 | 104.90877 | 105.83977 | 106.77076 | 107.70175 | 108.63275 | 109.5637 | 110.4947 | 111.4257 | 112.3567 | 113.2877 | 114.2187 | 115.1497 | 116.0807 | 117.0117 | 117.9427 | 118.8737 | 119.8047 | 120.7357 | 121.6667 |
| 108.42667 | 109.40421 | 110.38175 | 111.35930 | 112.33684 | 113.31439 | 114.2919 | 115.2695 | 116.2470 | 117.2246 | 118.2021 | 119.1796 | 120.1572 | 121.1347 | 122.1123 | 123.0898 | 124.0674 | 125.0449 | 126.0225 | 127.0000 |
| 112.87556 | 113.89965 | 114.92374 | 115.94784 | 116.97193 | 117.99602 | 119.0201 | 120.0442 | 121.0683 | 122.0924 | 123.1165 | 124.1406 | 125.1647 | 126.1888 | 127.2129 | 128.2370 | 129.2611 | 130.2851 | 131.3092 | 132.3333 |
| 117.32444 | 118.39509 | 119.46573 | 120.53637 | 121.60702 | 122.67766 | 123.7483 | 124.8189 | 125.8896 | 126.9602 | 128.0309 | 129.1015 | 130.1722 | 131.2428 | 132.3135 | 133.3841 | 134.4547 | 135.5254 | 136.5960 | 137.6667 |
| 121.77333 | 122.89053 | 124.00772 | 125.12491 | 126.24211 | 127.35930 | 128.4765 | 129.5937 | 130.7109 | 131.8281 | 132.9453 | 134.0625 | 135.1796 | 136.2968 | 137.4140 | 138.5312 | 139.6484 | 140.7656 | 141.8828 | 143.0000 |
| 126.22222 | 127.38596 | 128.54971 | 129.71345 | 130.87719 | 132.04094 | 133.2047 | 134.3684 | 135.5322 | 136.6959 | 137.8596 | 139.0234 | 140.1871 | 141.3509 | 142.5146 | 143.6784 | 144.8421 | 146.0058 | 147.1696 | 148.3333 |
| 130.67111 | 131.88140 | 133.09170 | 134.30199 | 135.51228 | 136.72257 | 137.9329 | 139.1432 | 140.3535 | 141.5637 | 142.7740 | 143.9843 | 145.1946 | 146.4049 | 147.6152 | 148.8255 | 150.0358 | 151.2461 | 152.4564 | 153.6667 |
| 135.12000 | 136.37684 | 137.63368 | 138.89053 | 140.14737 | 141.40421 | 142.6611 | 143.9179 | 145.1747 | 146.4316 | 147.6884 | 148.9453 | 150.2021 | 151.4589 | 152.7158 | 153.9726 | 155.2295 | 156.4863 | 157.7432 | 159.0000 |
| 95.08000 | 99.52889 | 103.9778 | 108.4267 | 112.8756 | 117.3244 | 121.7733 | 126.2222 | 130.6711 | 135.1200 |
| 95.91789 | 100.41333 | 104.9088 | 109.4042 | 113.8996 | 118.3951 | 122.8905 | 127.3860 | 131.8814 | 136.3768 |
| 96.75579 | 101.29778 | 105.8398 | 110.3818 | 114.9237 | 119.4657 | 124.0077 | 128.5497 | 133.0917 | 137.6337 |
| 97.59368 | 102.18222 | 106.7708 | 111.3593 | 115.9478 | 120.5364 | 125.1249 | 129.7135 | 134.3020 | 138.8905 |
| 98.43158 | 103.06667 | 107.7018 | 112.3368 | 116.9719 | 121.6070 | 126.2421 | 130.8772 | 135.5123 | 140.1474 |
| 99.26947 | 103.95111 | 108.6327 | 113.3144 | 117.9960 | 122.6777 | 127.3593 | 132.0409 | 136.7226 | 141.4042 |
| 100.10737 | 104.83556 | 109.5637 | 114.2919 | 119.0201 | 123.7483 | 128.4765 | 133.2047 | 137.9329 | 142.6611 |
| 100.94526 | 105.72000 | 110.4947 | 115.2695 | 120.0442 | 124.8189 | 129.5937 | 134.3684 | 139.1432 | 143.9179 |
| 101.78316 | 106.60444 | 111.4257 | 116.2470 | 121.0683 | 125.8896 | 130.7109 | 135.5322 | 140.3535 | 145.1747 |
| 102.62105 | 107.48889 | 112.3567 | 117.2246 | 122.0924 | 126.9602 | 131.8281 | 136.6959 | 141.5637 | 146.4316 |
| 103.45895 | 108.37333 | 113.2877 | 118.2021 | 123.1165 | 128.0309 | 132.9453 | 137.8596 | 142.7740 | 147.6884 |
| 104.29684 | 109.25778 | 114.2187 | 119.1796 | 124.1406 | 129.1015 | 134.0625 | 139.0234 | 143.9843 | 148.9453 |
| 105.13474 | 110.14222 | 115.1497 | 120.1572 | 125.1647 | 130.1722 | 135.1796 | 140.1871 | 145.1946 | 150.2021 |
| 105.97263 | 111.02667 | 116.0807 | 121.1347 | 126.1888 | 131.2428 | 136.2968 | 141.3509 | 146.4049 | 151.4589 |
| 106.81053 | 111.91111 | 117.0117 | 122.1123 | 127.2129 | 132.3135 | 137.4140 | 142.5146 | 147.6152 | 152.7158 |
| 107.64842 | 112.79556 | 117.9427 | 123.0898 | 128.2370 | 133.3841 | 138.5312 | 143.6784 | 148.8255 | 153.9726 |
| 108.48632 | 113.68000 | 118.8737 | 124.0674 | 129.2611 | 134.4547 | 139.6484 | 144.8421 | 150.0358 | 155.2295 |
| 109.32421 | 114.56444 | 119.8047 | 125.0449 | 130.2851 | 135.5254 | 140.7656 | 146.0058 | 151.2461 | 156.4863 |
| 110.16211 | 115.44889 | 120.7357 | 126.0225 | 131.3092 | 136.5960 | 141.8828 | 147.1696 | 152.4564 | 157.7432 |
| 111.00000 | 116.33333 | 121.6667 | 127.0000 | 132.3333 | 137.6667 | 143.0000 | 148.3333 | 153.6667 | 159.0000 |
?add_surface
?plot_ly
add_trace package:plotly R Documentation
_A_d_d _t_r_a_c_e(_s) _t_o _a _p_l_o_t_l_y _v_i_s_u_a_l_i_z_a_t_i_o_n
_D_e_s_c_r_i_p_t_i_o_n:
Add trace(s) to a plotly visualization
_U_s_a_g_e:
add_trace(p, ..., data = NULL, inherit = TRUE)
add_markers(p, x = NULL, y = NULL, z = NULL, ..., data = NULL, inherit = TRUE)
add_text(
p,
x = NULL,
y = NULL,
z = NULL,
text = NULL,
...,
data = NULL,
inherit = TRUE
)
add_paths(p, x = NULL, y = NULL, z = NULL, ..., data = NULL, inherit = TRUE)
add_lines(p, x = NULL, y = NULL, z = NULL, ..., data = NULL, inherit = TRUE)
add_segments(
p,
x = NULL,
y = NULL,
xend = NULL,
yend = NULL,
...,
data = NULL,
inherit = TRUE
)
add_polygons(p, x = NULL, y = NULL, ..., data = NULL, inherit = TRUE)
add_sf(p, ..., x = ~x, y = ~y, data = NULL, inherit = TRUE)
add_table(p, ..., rownames = TRUE, data = NULL, inherit = TRUE)
add_ribbons(
p,
x = NULL,
ymin = NULL,
ymax = NULL,
...,
data = NULL,
inherit = TRUE
)
add_image(p, z = NULL, colormodel = NULL, ..., data = NULL, inherit = TRUE)
add_area(p, r = NULL, theta = NULL, t = NULL, ..., data = NULL, inherit = TRUE)
add_pie(p, values = NULL, labels = NULL, ..., data = NULL, inherit = TRUE)
add_bars(p, x = NULL, y = NULL, ..., data = NULL, inherit = TRUE)
add_histogram(p, x = NULL, y = NULL, ..., data = NULL, inherit = TRUE)
add_histogram2d(
p,
x = NULL,
y = NULL,
z = NULL,
...,
data = NULL,
inherit = TRUE
)
add_histogram2dcontour(
p,
x = NULL,
y = NULL,
z = NULL,
...,
data = NULL,
inherit = TRUE
)
add_heatmap(p, x = NULL, y = NULL, z = NULL, ..., data = NULL, inherit = TRUE)
add_contour(p, z = NULL, ..., data = NULL, inherit = TRUE)
add_boxplot(p, x = NULL, y = NULL, ..., data = NULL, inherit = TRUE)
add_surface(p, z = NULL, ..., data = NULL, inherit = TRUE)
add_mesh(p, x = NULL, y = NULL, z = NULL, ..., data = NULL, inherit = TRUE)
add_scattergeo(p, ...)
add_choropleth(p, z = NULL, ..., data = NULL, inherit = TRUE)
_A_r_g_u_m_e_n_t_s:
p: a plotly object
...: Arguments (i.e., attributes) passed along to the trace
‘type’. See ‘schema()’ for a list of acceptable attributes
for a given trace ‘type’ (by going to ‘traces’ -> ‘type’ ->
‘attributes’). Note that attributes provided at this level
may override other arguments (e.g. ‘plot_ly(x = 1:10, y =
1:10, color = I("red"), marker = list(color = "blue"))’).
data: A data frame (optional) or crosstalk::SharedData object.
inherit: inherit attributes from ‘plot_ly()’?
x: the x variable.
y: the y variable.
z: a numeric matrix (unless ‘add_image()’, which wants a raster
object, see ‘as.raster()’).
text: textual labels.
xend: "final" x position (in this context, x represents "start")
yend: "final" y position (in this context, y represents "start")
rownames: whether or not to display the rownames of ‘data’.
ymin: a variable used to define the lower boundary of a polygon.
ymax: a variable used to define the upper boundary of a polygon.
colormodel: Sets the colormodel for image traces if ‘z’ is not a raster
object. If ‘z’ is a raster object (see ‘as.raster()’), the
‘'rgba'’ colormodel is always used.
r: Sets the radial coordinates.
theta: Sets the angular coordinates.
t: Deprecated. Use ‘theta’ instead.
values: the value to associated with each slice of the pie.
labels: the labels (categories) corresponding to ‘values’.
_A_u_t_h_o_r(_s):
Carson Sievert
_R_e_f_e_r_e_n_c_e_s:
<https://plotly-r.com/overview.html>
<https://plotly.com/r/>
<https://plotly.com/r/reference/>
_S_e_e _A_l_s_o:
‘plot_ly()’
_E_x_a_m_p_l_e_s:
# the `plot_ly()` function initiates an object, and if no trace type
# is specified, it sets a sensible default
p <- plot_ly(economics, x = ~date, y = ~uempmed)
p
# some `add_*()` functions are a specific case of a trace type
# for example, `add_markers()` is a scatter trace with mode of markers
add_markers(p)
# scatter trace with mode of text
add_text(p, text = "%")
# scatter trace with mode of lines
add_paths(p)
# like `add_paths()`, but ensures points are connected according to `x`
add_lines(p)
# if you prefer to work with plotly.js more directly, can always
# use `add_trace()` and specify the type yourself
add_trace(p, type = "scatter", mode = "markers+lines")
# mappings provided to `plot_ly()` are "global", but can be overwritten
plot_ly(economics, x = ~date, y = ~uempmed, color = I("red"), showlegend = FALSE) %>%
add_lines() %>%
add_markers(color = ~pop)
# a number of `add_*()` functions are special cases of the scatter trace
plot_ly(economics, x = ~date) %>%
add_ribbons(ymin = ~pce - 1e3, ymax = ~pce + 1e3)
# use `group_by()` (or `group2NA()`) to apply visual mapping
# once per group (e.g. one line per group)
txhousing %>%
group_by(city) %>%
plot_ly(x = ~date, y = ~median) %>%
add_lines(color = I("black"))
## Not run:
# use `add_sf()` or `add_polygons()` to create geo-spatial maps
# http://blog.cpsievert.me/2018/03/30/visualizing-geo-spatial-data-with-sf-and-plotly/
if (requireNamespace("sf", quietly = TRUE)) {
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
plot_ly() %>% add_sf(data = nc)
}
# univariate summary statistics
plot_ly(mtcars, x = ~factor(vs), y = ~mpg) %>%
add_boxplot()
plot_ly(mtcars, x = ~factor(vs), y = ~mpg) %>%
add_trace(type = "violin")
# `add_histogram()` does binning for you...
mtcars %>%
plot_ly(x = ~factor(vs)) %>%
add_histogram()
# ...but you can 'pre-compute' bar heights in R
mtcars %>%
dplyr::count(vs) %>%
plot_ly(x = ~vs, y = ~n) %>%
add_bars()
# the 2d analogy of add_histogram() is add_histogram2d()/add_histogram2dcontour()
library(MASS)
(p <- plot_ly(geyser, x = ~waiting, y = ~duration))
add_histogram2d(p)
add_histogram2dcontour(p)
# the 2d analogy of add_bars() is add_heatmap()/add_contour()
# (i.e., bin counts must be pre-specified)
den <- kde2d(geyser$waiting, geyser$duration)
p <- plot_ly(x = den$x, y = den$y, z = den$z)
add_heatmap(p)
add_contour(p)
# `add_table()` makes it easy to map a data frame to the table trace type
plot_ly(economics) %>%
add_table()
# pie charts!
ds <- data.frame(labels = c("A", "B", "C"), values = c(10, 40, 60))
plot_ly(ds, labels = ~labels, values = ~values) %>%
add_pie() %>%
layout(title = "Basic Pie Chart using Plotly")
data(wind)
plot_ly(wind, r = ~r, theta = ~t) %>%
add_area(color = ~nms) %>%
layout(
polar = list(
radialaxis = list(ticksuffix = "%"),
angularaxis = list(rotation = 90)
)
)
# ------------------------------------------------------------
# 3D chart types
# ------------------------------------------------------------
plot_ly(z = ~volcano) %>%
add_surface()
plot_ly(x = c(0, 0, 1), y = c(0, 1, 0), z = c(0, 0, 0)) %>%
add_mesh()
## End(Not run)
plot_ly package:plotly R Documentation
_I_n_i_t_i_a_t_e _a _p_l_o_t_l_y _v_i_s_u_a_l_i_z_a_t_i_o_n
_D_e_s_c_r_i_p_t_i_o_n:
This function maps R objects to plotly.js, an (MIT licensed)
web-based interactive charting library. It provides abstractions
for doing common things (e.g. mapping data values to fill colors
(via ‘color’) or creating animations (via ‘frame’)) and sets some
different defaults to make the interface feel more 'R-like' (i.e.,
closer to ‘plot()’ and ‘ggplot2::qplot()’).
_U_s_a_g_e:
plot_ly(
data = data.frame(),
...,
type = NULL,
name,
color,
colors = NULL,
alpha = NULL,
stroke,
strokes = NULL,
alpha_stroke = 1,
size,
sizes = c(10, 100),
span,
spans = c(1, 20),
symbol,
symbols = NULL,
linetype,
linetypes = NULL,
split,
frame,
width = NULL,
height = NULL,
source = "A"
)
_A_r_g_u_m_e_n_t_s:
data: A data frame (optional) or crosstalk::SharedData object.
...: Arguments (i.e., attributes) passed along to the trace
‘type’. See ‘schema()’ for a list of acceptable attributes
for a given trace ‘type’ (by going to ‘traces’ -> ‘type’ ->
‘attributes’). Note that attributes provided at this level
may override other arguments (e.g. ‘plot_ly(x = 1:10, y =
1:10, color = I("red"), marker = list(color = "blue"))’).
type: A character string specifying the trace type (e.g.
‘"scatter"’, ‘"bar"’, ‘"box"’, etc). If specified, it
_always_ creates a trace, otherwise
name: Values mapped to the trace's name attribute. Since a trace
can only have one name, this argument acts very much like
‘split’ in that it creates one trace for every unique value.
color: Values mapped to relevant 'fill-color' attribute(s) (e.g.
fillcolor, marker.color, textfont.color, etc.). The mapping
from data values to color codes may be controlled using
‘colors’ and ‘alpha’, or avoided altogether via ‘I()’ (e.g.,
‘color = I("red")’). Any color understood by
‘grDevices::col2rgb()’ may be used in this way.
colors: Either a colorbrewer2.org palette name (e.g. "YlOrRd" or
"Blues"), or a vector of colors to interpolate in hexadecimal
"#RRGGBB" format, or a color interpolation function like
‘colorRamp()’.
alpha: A number between 0 and 1 specifying the alpha channel applied
to ‘color’. Defaults to 0.5 when mapping to fillcolor and 1
otherwise.
stroke: Similar to ‘color’, but values are mapped to relevant
'stroke-color' attribute(s) (e.g., marker.line.color and
line.color for filled polygons). If not specified, ‘stroke’
inherits from ‘color’.
strokes: Similar to ‘colors’, but controls the ‘stroke’ mapping.
alpha_stroke: Similar to ‘alpha’, but applied to ‘stroke’.
size: (Numeric) values mapped to relevant 'fill-size' attribute(s)
(e.g., marker.size, textfont.size, and error_x.width). The
mapping from data values to symbols may be controlled using
‘sizes’, or avoided altogether via ‘I()’ (e.g., ‘size =
I(30)’).
sizes: A numeric vector of length 2 used to scale ‘size’ to pixels.
span: (Numeric) values mapped to relevant 'stroke-size'
attribute(s) (e.g., marker.line.width, line.width for filled
polygons, and error_x.thickness) The mapping from data values
to symbols may be controlled using ‘spans’, or avoided
altogether via ‘I()’ (e.g., ‘span = I(30)’).
spans: A numeric vector of length 2 used to scale ‘span’ to pixels.
symbol: (Discrete) values mapped to marker.symbol. The mapping from
data values to symbols may be controlled using ‘symbols’, or
avoided altogether via ‘I()’ (e.g., ‘symbol =
I("pentagon")’). Any pch value or symbol name may be used in
this way.
symbols: A character vector of pch values or symbol names.
linetype: (Discrete) values mapped to line.dash. The mapping from data
values to symbols may be controlled using ‘linetypes’, or
avoided altogether via ‘I()’ (e.g., ‘linetype = I("dash")’).
Any ‘lty’ (see par) value or dash name may be used in this
way.
linetypes: A character vector of ‘lty’ values or dash names
split: (Discrete) values used to create multiple traces (one trace
per value).
frame: (Discrete) values used to create animation frames.
width: Width in pixels (optional, defaults to automatic sizing).
height: Height in pixels (optional, defaults to automatic sizing).
source: a character string of length 1. Match the value of this
string with the source argument in ‘event_data()’ to retrieve
the event data corresponding to a specific plot (shiny apps
can have multiple plots).
_D_e_t_a_i_l_s:
Unless ‘type’ is specified, this function just initiates a plotly
object with 'global' attributes that are passed onto downstream
uses of ‘add_trace()’ (or similar). A formula must always be used
when referencing column name(s) in ‘data’ (e.g. ‘plot_ly(mtcars, x
= ~wt)’). Formulas are optional when supplying values directly,
but they do help inform default axis/scale titles (e.g.,
‘plot_ly(x = mtcars$wt)’ vs ‘plot_ly(x = ~mtcars$wt)’)
_A_u_t_h_o_r(_s):
Carson Sievert
_R_e_f_e_r_e_n_c_e_s:
<https://plotly-r.com/overview.html>
_S_e_e _A_l_s_o:
• For initializing a plotly-geo object: ‘plot_geo()’
• For initializing a plotly-mapbox object: ‘plot_mapbox()’
• For translating a ggplot2 object to a plotly object:
‘ggplotly()’
• For modifying any plotly object: ‘layout()’, ‘add_trace()’,
‘style()’
• For linked brushing: ‘highlight()’
• For arranging multiple plots: ‘subplot()’,
‘crosstalk::bscols()’
• For inspecting plotly objects: ‘plotly_json()’
• For quick, accurate, and searchable plotly.js reference:
‘schema()’
_E_x_a_m_p_l_e_s:
## Not run:
# plot_ly() tries to create a sensible plot based on the information you
# give it. If you don't provide a trace type, plot_ly() will infer one.
plot_ly(economics, x = ~pop)
plot_ly(economics, x = ~date, y = ~pop)
# plot_ly() doesn't require data frame(s), which allows one to take
# advantage of trace type(s) designed specifically for numeric matrices
plot_ly(z = ~volcano)
plot_ly(z = ~volcano, type = "surface")
# plotly has a functional interface: every plotly function takes a plotly
# object as it's first input argument and returns a modified plotly object
add_lines(plot_ly(economics, x = ~date, y = ~unemploy/pop))
# To make code more readable, plotly imports the pipe operator from magrittr
economics %>% plot_ly(x = ~date, y = ~unemploy/pop) %>% add_lines()
# Attributes defined via plot_ly() set 'global' attributes that
# are carried onto subsequent traces, but those may be over-written
plot_ly(economics, x = ~date, color = I("black")) %>%
add_lines(y = ~uempmed) %>%
add_lines(y = ~psavert, color = I("red"))
# Attributes are documented in the figure reference -> https://plotly.com/r/reference
# You might notice plot_ly() has named arguments that aren't in this figure
# reference. These arguments make it easier to map abstract data values to
# visual attributes.
p <- plot_ly(palmerpenguins::penguins, x = ~bill_length_mm, y = ~body_mass_g)
add_markers(p, color = ~bill_depth_mm, size = ~bill_depth_mm)
add_markers(p, color = ~species)
add_markers(p, color = ~species, colors = "Set1")
add_markers(p, symbol = ~species)
add_paths(p, linetype = ~species)
## End(Not run)
college_test <- (outer(gpa_test,iq_test,LEVEL=1,FUN=salary_model))
college_test
| 95.08000 | 96.84889 | 98.61778 | 100.3867 | 102.1556 | 103.9244 | 105.6933 | 107.4622 | 109.2311 | 111.0000 |
| 99.52889 | 101.39605 | 103.26321 | 105.1304 | 106.9975 | 108.8647 | 110.7319 | 112.5990 | 114.4662 | 116.3333 |
| 103.97778 | 105.94321 | 107.90864 | 109.8741 | 111.8395 | 113.8049 | 115.7704 | 117.7358 | 119.7012 | 121.6667 |
| 108.42667 | 110.49037 | 112.55407 | 114.6178 | 116.6815 | 118.7452 | 120.8089 | 122.8726 | 124.9363 | 127.0000 |
| 112.87556 | 115.03753 | 117.19951 | 119.3615 | 121.5235 | 123.6854 | 125.8474 | 128.0094 | 130.1714 | 132.3333 |
| 117.32444 | 119.58469 | 121.84494 | 124.1052 | 126.3654 | 128.6257 | 130.8859 | 133.1462 | 135.4064 | 137.6667 |
| 121.77333 | 124.13185 | 126.49037 | 128.8489 | 131.2074 | 133.5659 | 135.9244 | 138.2830 | 140.6415 | 143.0000 |
| 126.22222 | 128.67901 | 131.13580 | 133.5926 | 136.0494 | 138.5062 | 140.9630 | 143.4198 | 145.8765 | 148.3333 |
| 130.67111 | 133.22617 | 135.78123 | 138.3363 | 140.8914 | 143.4464 | 146.0015 | 148.5565 | 151.1116 | 153.6667 |
| 135.12000 | 137.77333 | 140.42667 | 143.0800 | 145.7333 | 148.3867 | 151.0400 | 153.6933 | 156.3467 | 159.0000 |
# a
# 固定IQ和GPA的取值,然后考虑预测变量的取值对于响应变量的影响
library(plotly) #用于构建交互式的图表
salary_model <- function(GPA,IQ,LEVEL){
50+20*GPA+0.07*IQ+35*LEVEL+0.01*GPA*IQ-10*GPA*LEVEL
}
gpa_test <- seq(1,5,length=10) # 人工构建从1到5,长度为10的等间隔的IQ值
iq_test <- seq(1,200,length=10) # 人工构建从1到200,长度为10的等间隔的GPA值
college_test <- t(outer(gpa_test,iq_test,LEVEL=1,FUN=salary_model)) #从构建的gpa以及iq值中,预测出的salary值,LEVEL=1表示大学学历,对获取的各种gpaxiq值的组会
high_school_test <- t(outer(gpa_test,iq_test,LEVEL=0,FUN=salary_model)) #同上,注意此处需要进行转置,否则获得的结果在x轴以及y轴上是反过来的!!!
plot_ly(x=gpa_test,y=iq_test) |>
add_surface( # 添加一个 3D 表面图
x=gpa_test,y=iq_test,
z=~college_test, # 指定 z 轴的数据为 College 学生的薪水
colorscale=list(c(0, 1), c("rgb(173,216,230)", "rgb(0,0,139)")), # 指定颜色渐变,浅蓝色到深蓝色
colorbar=list(title="college") # 指定颜色条的标题
) |>
add_surface( # 添加另一个 3D 表面图
x=gpa_test,y=iq_test,
z=~high_school_test,
colorscale=list(c(0, 1), c("rgb(255,182,193)", "rgb(139,0,0)")), # 浅红色到深红色
colorbar=list(title="high school")
) |>
layout( # 设置图表布局
scene=list( # 指定 3D 场景的轴标签
xaxis = list(title = "GPA"),
yaxis = list(title = "IQ"),
zaxis = list(title = "Salary")
)
)
结论是和前面数学表达式中获取的一致的
我们以第1位作为测试,也就是gpa是1,然后iq是23.11111的情况,对于college也就是大学生来说薪资是96.84,然后我们查看上面的矩阵的值:
发现实际上是有问题的,所以得在outer之后的数据中进行转置,或者是在add_surface中输入x以及y的值的时候,反过来定义
我们此处进行转置,然后结果确实就验证正确了
# b
salary_model(GPA = 4.0,IQ = 110,LEVEL = 1) # 预测大学学历的薪水,是137.1
# c
# 对于交叉项的效应,我们同样可以通过lm函数来进行计算
# 就像前面实践lab中的那样,summary(lm(medv ~ lstat * age, data = Boston)) ,我们需要检验这个交叉项的显著值,我们才能判定这个交叉项是否对于模型的预测有重要的影响
# 所以这个问题是false!
4,
a
首先依据课件ppt上的说法:
# 越灵活。参数越多的模型,其在训练集上的表现会越好,但是在测试集上的表现可能会越差,这就是过拟合的问题
# 所以立方项的模型拟合会比线性模型拟合具有更低的训练集的RSS,立方项的拟合只需要在高于一次项的项上的系数归0,就可以得到一个线性模型,总之就是更加灵活
# 所以立方项的模型拟合会比线性模型拟合具有更低的训练集的RSS
# b
# 测试集部分的RSS,实际上会呈现出U形曲线,所以无法确定
# c
# 还是立方项的拟合模型会导致更低的训练集的RSS,道理和前面一样,因为训练集上拟合的话,立方项的模型永远比线性模型拟合得更好
# d
# 无法回答,因为到底Y和X之间的非线性或者线性关系如何无法确定
9,
# a
library(ISLR2)
pairs(Auto)
# b
names(Auto)
# 'mpg''cylinders''displacement''horsepower''weight''acceleration''year''origin''name'
Auto %>% select(-name)%>%cor() # 去掉最后一列非数值变量列
- 'mpg'
- 'cylinders'
- 'displacement'
- 'horsepower'
- 'weight'
- 'acceleration'
- 'year'
- 'origin'
- 'name'
| mpg | cylinders | displacement | horsepower | weight | acceleration | year | origin | |
|---|---|---|---|---|---|---|---|---|
| mpg | 1.0000000 | -0.7776175 | -0.8051269 | -0.7784268 | -0.8322442 | 0.4233285 | 0.5805410 | 0.5652088 |
| cylinders | -0.7776175 | 1.0000000 | 0.9508233 | 0.8429834 | 0.8975273 | -0.5046834 | -0.3456474 | -0.5689316 |
| displacement | -0.8051269 | 0.9508233 | 1.0000000 | 0.8972570 | 0.9329944 | -0.5438005 | -0.3698552 | -0.6145351 |
| horsepower | -0.7784268 | 0.8429834 | 0.8972570 | 1.0000000 | 0.8645377 | -0.6891955 | -0.4163615 | -0.4551715 |
| weight | -0.8322442 | 0.8975273 | 0.9329944 | 0.8645377 | 1.0000000 | -0.4168392 | -0.3091199 | -0.5850054 |
| acceleration | 0.4233285 | -0.5046834 | -0.5438005 | -0.6891955 | -0.4168392 | 1.0000000 | 0.2903161 | 0.2127458 |
| year | 0.5805410 | -0.3456474 | -0.3698552 | -0.4163615 | -0.3091199 | 0.2903161 | 1.0000000 | 0.1815277 |
| origin | 0.5652088 | -0.5689316 | -0.6145351 | -0.4551715 | -0.5850054 | 0.2127458 | 0.1815277 | 1.0000000 |
# c
lm.fit <- lm(mpg ~ . -name,data = Auto)
summary(lm.fit)
Call:
lm(formula = mpg ~ . - name, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.5903 -2.1565 -0.1169 1.8690 13.0604
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.218435 4.644294 -3.707 0.00024 ***
cylinders -0.493376 0.323282 -1.526 0.12780
displacement 0.019896 0.007515 2.647 0.00844 **
horsepower -0.016951 0.013787 -1.230 0.21963
weight -0.006474 0.000652 -9.929 < 2e-16 ***
acceleration 0.080576 0.098845 0.815 0.41548
year 0.750773 0.050973 14.729 < 2e-16 ***
origin 1.426141 0.278136 5.127 4.67e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.328 on 384 degrees of freedom
Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
# 从上面我们可以很清楚地看到响应变量和预测变量之间的关系,从p值以及系数的估计值的正负综合来判断
# 有统计学显著性的是displacement、weight、year 、origin
# year的系数是正的,表明mpg平均每年增加0.750773(单位)
# d
par(mfrow = c(2, 2))
plot(lm.fit)
残差与拟合值图
残差在零线周围随机分布,这是一个好兆头。没有明显的模式或曲线表明关系中存在非线性。然而,有几个残差相对较大的点(正负都有),可能值得进一步调查。
残差的 Q-Q 图
Q-Q 图显示残差大致沿着对角线分布,表明它们大致呈正态分布。这是线性回归的许多统计检验的重要假设。在尾部有几个点稍微偏离直线,但这很常见,并不一定表明正态性严重违反。
尺度-位置图
尺度-位置图显示绝对标准化残差的平方根与拟合值的关系。红线大致平坦,表明残差的方差在拟合值范围内相对稳定。这被称为同方差性,是线性回归的另一个关键假设。在较高拟合值处方差有轻微增加,但并不明显。
残差与杠杆值图
此图有助于识别可能对回归模型产生不成比例影响的高杠杆点。大多数点落在典型范围内,但有几个点的杠杆值较高(大约在 0.15-0.20 之间)。这些点应进行检查,以确定它们是否对模型拟合产生过大影响。库克距离等高线(虚线)有助于识别对模型有较大影响的点;等高线内的点影响较小。
问题总结
拟合问题:总体来看,模型拟合良好,残差随机分布且大致呈正态分布。在较高拟合值处方差有轻微增加,但并不严重。
异常大的离群值:有几个残差 magnitude 较大的点,但没有特别异常的离群值。
高杠杆观测值:有几个观测值的杠杆值较高,但没有特别高。有必要检查这些点,以确保它们没有驱动模型结果。
# e
# 有7个预测变量,理论上来说,有C72 = 7!/(2! * 5!) = 21个两两相关的交叉项
# 加载必要的库
library(dplyr)
# 定义变量
variables <- c('cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin')
# 生成所有两两变量组合的交叉项
combinations <- combn(variables, 2, simplify = FALSE)
# 循环遍历每个组合,拟合线性模型并输出摘要
for (combo in combinations) {
interaction_term <- paste(combo, collapse = ":")
formula <- as.formula(paste("mpg ~ . -name +", interaction_term))
lm.fit <- lm(formula, data = Auto)
cat("\n\n### Interaction Term:", interaction_term, "\n")
print(summary(lm.fit))
}
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
### Interaction Term: cylinders:displacement
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-11.6081 -1.7833 -0.0465 1.6821 12.2617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.7096590 4.6858582 -0.578 0.563426
cylinders -2.6962123 0.4094916 -6.584 1.51e-10 ***
displacement -0.0774797 0.0141535 -5.474 7.96e-08 ***
horsepower -0.0476026 0.0133736 -3.559 0.000418 ***
weight -0.0052339 0.0006253 -8.370 1.10e-15 ***
acceleration 0.0597997 0.0918038 0.651 0.515188
year 0.7594500 0.0473354 16.044 < 2e-16 ***
origin 0.7087399 0.2736917 2.590 0.009976 **
cylinders:displacement 0.0136081 0.0017209 7.907 2.84e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.089 on 383 degrees of freedom
Multiple R-squared: 0.8465, Adjusted R-squared: 0.8433
F-statistic: 264.1 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: cylinders:horsepower
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.2399 -1.6871 -0.0511 1.2858 11.9380
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.7025260 4.9115648 2.383 0.017676 *
cylinders -4.3060695 0.4580950 -9.400 < 2e-16 ***
displacement -0.0013925 0.0069110 -0.201 0.840426
horsepower -0.3156601 0.0306339 -10.304 < 2e-16 ***
weight -0.0038948 0.0006231 -6.250 1.09e-09 ***
acceleration -0.1703028 0.0901427 -1.889 0.059612 .
year 0.7393193 0.0448736 16.476 < 2e-16 ***
origin 0.9031644 0.2496880 3.617 0.000338 ***
cylinders:horsepower 0.0402008 0.0037856 10.619 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.929 on 383 degrees of freedom
Multiple R-squared: 0.8621, Adjusted R-squared: 0.8592
F-statistic: 299.3 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: cylinders:weight
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-10.9484 -1.7133 -0.1809 1.4530 12.4137
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.3143478 5.0076737 1.461 0.14494
cylinders -5.0347425 0.5795767 -8.687 < 2e-16 ***
displacement 0.0156444 0.0068409 2.287 0.02275 *
horsepower -0.0314213 0.0126216 -2.489 0.01322 *
weight -0.0150329 0.0011125 -13.513 < 2e-16 ***
acceleration 0.1006438 0.0897944 1.121 0.26306
year 0.7813453 0.0464139 16.834 < 2e-16 ***
origin 0.8030154 0.2617333 3.068 0.00231 **
cylinders:weight 0.0015058 0.0001657 9.088 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.022 on 383 degrees of freedom
Multiple R-squared: 0.8531, Adjusted R-squared: 0.8501
F-statistic: 278.1 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: cylinders:acceleration
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.6499 -2.0696 -0.1522 1.7965 12.5285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.424e+01 5.741e+00 -5.964 5.60e-09 ***
cylinders 2.940e+00 7.810e-01 3.764 0.000193 ***
displacement 8.611e-03 7.677e-03 1.122 0.262667
horsepower -3.805e-02 1.411e-02 -2.697 0.007308 **
weight -5.238e-03 6.844e-04 -7.653 1.61e-13 ***
acceleration 1.097e+00 2.324e-01 4.719 3.33e-06 ***
year 7.654e-01 4.966e-02 15.412 < 2e-16 ***
origin 1.269e+00 2.725e-01 4.657 4.44e-06 ***
cylinders:acceleration -2.141e-01 4.458e-02 -4.802 2.25e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.236 on 383 degrees of freedom
Multiple R-squared: 0.8316, Adjusted R-squared: 0.8281
F-statistic: 236.4 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: cylinders:year
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.3346 -2.0533 -0.1177 1.6257 12.0879
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.290e+01 1.256e+01 -7.399 8.75e-13 ***
cylinders 1.469e+01 2.378e+00 6.178 1.66e-09 ***
displacement 1.175e-02 7.259e-03 1.619 0.1063
horsepower -2.765e-02 1.322e-02 -2.092 0.0371 *
weight -6.242e-03 6.212e-04 -10.049 < 2e-16 ***
acceleration 1.213e-01 9.423e-02 1.287 0.1988
year 1.734e+00 1.601e-01 10.826 < 2e-16 ***
origin 1.375e+00 2.647e-01 5.197 3.30e-07 ***
cylinders:year -1.949e-01 3.027e-02 -6.439 3.60e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.165 on 383 degrees of freedom
Multiple R-squared: 0.8389, Adjusted R-squared: 0.8356
F-statistic: 249.3 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: cylinders:origin
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.7908 -2.1679 -0.1234 1.9043 13.0306
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.792e+01 5.169e+00 -3.466 0.000587 ***
cylinders -3.572e-01 5.459e-01 -0.654 0.513327
displacement 1.919e-02 7.865e-03 2.439 0.015169 *
horsepower -1.656e-02 1.386e-02 -1.195 0.232926
weight -6.462e-03 6.541e-04 -9.879 < 2e-16 ***
acceleration 8.068e-02 9.896e-02 0.815 0.415449
year 7.527e-01 5.141e-02 14.640 < 2e-16 ***
origin 1.838e+00 1.358e+00 1.353 0.176836
cylinders:origin -9.985e-02 3.223e-01 -0.310 0.756898
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.332 on 383 degrees of freedom
Multiple R-squared: 0.8215, Adjusted R-squared: 0.8178
F-statistic: 220.4 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: displacement:horsepower
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.7010 -1.6009 -0.0967 1.4119 12.6734
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.894e+00 4.302e+00 -0.440 0.66007
cylinders 6.466e-01 3.017e-01 2.143 0.03275 *
displacement -7.487e-02 1.092e-02 -6.859 2.80e-11 ***
horsepower -1.975e-01 2.052e-02 -9.624 < 2e-16 ***
weight -3.147e-03 6.475e-04 -4.861 1.71e-06 ***
acceleration -2.131e-01 9.062e-02 -2.351 0.01921 *
year 7.379e-01 4.463e-02 16.534 < 2e-16 ***
origin 6.891e-01 2.527e-01 2.727 0.00668 **
displacement:horsepower 5.236e-04 4.813e-05 10.878 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.912 on 383 degrees of freedom
Multiple R-squared: 0.8636, Adjusted R-squared: 0.8608
F-statistic: 303.1 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: displacement:weight
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.9027 -1.8092 -0.0946 1.5549 12.1687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.389e+00 4.301e+00 -1.253 0.2109
cylinders 1.175e-01 2.943e-01 0.399 0.6899
displacement -6.837e-02 1.104e-02 -6.193 1.52e-09 ***
horsepower -3.280e-02 1.238e-02 -2.649 0.0084 **
weight -1.064e-02 7.136e-04 -14.915 < 2e-16 ***
acceleration 6.724e-02 8.805e-02 0.764 0.4455
year 7.852e-01 4.553e-02 17.246 < 2e-16 ***
origin 5.610e-01 2.622e-01 2.139 0.0331 *
displacement:weight 2.269e-05 2.257e-06 10.054 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.964 on 383 degrees of freedom
Multiple R-squared: 0.8588, Adjusted R-squared: 0.8558
F-statistic: 291.1 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: displacement:acceleration
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.129 -1.899 -0.135 1.755 12.119
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.005e+01 4.737e+00 -6.343 6.36e-10 ***
cylinders 2.136e-03 3.125e-01 0.007 0.994550
displacement 7.022e-02 1.005e-02 6.989 1.24e-11 ***
horsepower -5.515e-02 1.407e-02 -3.920 0.000105 ***
weight -4.211e-03 6.929e-04 -6.077 2.96e-09 ***
acceleration 7.530e-01 1.332e-01 5.653 3.09e-08 ***
year 7.722e-01 4.811e-02 16.051 < 2e-16 ***
origin 1.057e+00 2.671e-01 3.958 9.01e-05 ***
displacement:acceleration -4.855e-03 6.879e-04 -7.058 7.99e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.134 on 383 degrees of freedom
Multiple R-squared: 0.842, Adjusted R-squared: 0.8387
F-statistic: 255.2 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: displacement:year
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-7.9549 -1.9978 -0.0345 1.6232 12.2793
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.241e+01 7.811e+00 -7.990 1.60e-14 ***
cylinders 1.200e-02 3.133e-01 0.038 0.9695
displacement 2.710e-01 3.663e-02 7.399 8.78e-13 ***
horsepower -3.201e-02 1.318e-02 -2.429 0.0156 *
weight -5.860e-03 6.211e-04 -9.435 < 2e-16 ***
acceleration 1.059e-01 9.328e-02 1.136 0.2568
year 1.334e+00 9.636e-02 13.848 < 2e-16 ***
origin 1.230e+00 2.638e-01 4.662 4.34e-06 ***
displacement:year -3.491e-03 4.997e-04 -6.988 1.25e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.138 on 383 degrees of freedom
Multiple R-squared: 0.8417, Adjusted R-squared: 0.8384
F-statistic: 254.5 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: displacement:origin
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-11.2312 -2.0927 -0.1295 1.7287 12.2761
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.140e+01 4.827e+00 -4.434 1.21e-05 ***
cylinders -4.731e-01 3.204e-01 -1.477 0.140577
displacement 3.802e-02 9.768e-03 3.892 0.000117 ***
horsepower -1.523e-02 1.367e-02 -1.114 0.265920
weight -5.940e-03 6.723e-04 -8.835 < 2e-16 ***
acceleration 4.578e-02 9.868e-02 0.464 0.642939
year 7.682e-01 5.087e-02 15.103 < 2e-16 ***
origin 3.810e+00 8.762e-01 4.349 1.76e-05 ***
displacement:origin -2.262e-02 7.891e-03 -2.866 0.004383 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.297 on 383 degrees of freedom
Multiple R-squared: 0.8252, Adjusted R-squared: 0.8216
F-statistic: 226.1 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: horsepower:weight
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.589 -1.617 -0.184 1.541 12.001
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.876e+00 4.511e+00 0.638 0.524147
cylinders -2.955e-02 2.881e-01 -0.103 0.918363
displacement 5.950e-03 6.750e-03 0.881 0.378610
horsepower -2.313e-01 2.363e-02 -9.791 < 2e-16 ***
weight -1.121e-02 7.285e-04 -15.393 < 2e-16 ***
acceleration -9.019e-02 8.855e-02 -1.019 0.309081
year 7.695e-01 4.494e-02 17.124 < 2e-16 ***
origin 8.344e-01 2.513e-01 3.320 0.000986 ***
horsepower:weight 5.529e-05 5.227e-06 10.577 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.931 on 383 degrees of freedom
Multiple R-squared: 0.8618, Adjusted R-squared: 0.859
F-statistic: 298.6 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: horsepower:acceleration
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.0329 -1.8177 -0.1183 1.7247 12.4870
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -32.499820 4.923380 -6.601 1.36e-10 ***
cylinders 0.083489 0.316913 0.263 0.792350
displacement -0.007649 0.008161 -0.937 0.349244
horsepower 0.127188 0.024746 5.140 4.40e-07 ***
weight -0.003976 0.000716 -5.552 5.27e-08 ***
acceleration 0.983282 0.161513 6.088 2.78e-09 ***
year 0.755919 0.048179 15.690 < 2e-16 ***
origin 1.035733 0.268962 3.851 0.000138 ***
horsepower:acceleration -0.012139 0.001772 -6.851 2.93e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.145 on 383 degrees of freedom
Multiple R-squared: 0.841, Adjusted R-squared: 0.8376
F-statistic: 253.2 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: horsepower:year
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.6314 -1.8896 -0.0724 1.5781 11.6140
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.723e+01 1.008e+01 -9.648 < 2e-16 ***
cylinders 8.146e-02 3.027e-01 0.269 0.788
displacement 5.183e-03 7.072e-03 0.733 0.464
horsepower 8.070e-01 9.496e-02 8.498 4.33e-16 ***
weight -5.288e-03 6.112e-04 -8.651 < 2e-16 ***
acceleration -2.894e-02 9.121e-02 -0.317 0.751
year 1.830e+00 1.318e-01 13.887 < 2e-16 ***
origin 1.178e+00 2.558e-01 4.604 5.66e-06 ***
horsepower:year -1.140e-02 1.302e-03 -8.754 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.042 on 383 degrees of freedom
Multiple R-squared: 0.8512, Adjusted R-squared: 0.8481
F-statistic: 274 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: horsepower:origin
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.277 -1.875 -0.225 1.570 12.080
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.196e+01 4.396e+00 -4.996 8.94e-07 ***
cylinders -5.275e-01 3.028e-01 -1.742 0.0823 .
displacement -1.486e-03 7.607e-03 -0.195 0.8452
horsepower 8.173e-02 1.856e-02 4.404 1.38e-05 ***
weight -4.710e-03 6.555e-04 -7.186 3.52e-12 ***
acceleration -1.124e-01 9.617e-02 -1.168 0.2434
year 7.327e-01 4.780e-02 15.328 < 2e-16 ***
origin 7.695e+00 8.858e-01 8.687 < 2e-16 ***
horsepower:origin -7.955e-02 1.074e-02 -7.405 8.44e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.116 on 383 degrees of freedom
Multiple R-squared: 0.8438, Adjusted R-squared: 0.8406
F-statistic: 258.7 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: weight:acceleration
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.247 -2.048 -0.045 1.619 12.193
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.364e+01 5.811e+00 -7.511 4.18e-13 ***
cylinders -2.141e-01 3.078e-01 -0.696 0.487117
displacement 3.138e-03 7.495e-03 0.419 0.675622
horsepower -4.141e-02 1.348e-02 -3.071 0.002287 **
weight 4.027e-03 1.636e-03 2.462 0.014268 *
acceleration 1.629e+00 2.422e-01 6.726 6.36e-11 ***
year 7.821e-01 4.833e-02 16.184 < 2e-16 ***
origin 1.033e+00 2.686e-01 3.846 0.000141 ***
weight:acceleration -5.826e-04 8.408e-05 -6.928 1.81e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.141 on 383 degrees of freedom
Multiple R-squared: 0.8414, Adjusted R-squared: 0.838
F-statistic: 253.9 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: weight:year
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-7.9995 -1.8495 -0.1559 1.6061 11.7042
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.186e+02 1.338e+01 -8.864 < 2e-16 ***
cylinders -1.218e-01 3.032e-01 -0.402 0.6881
displacement 1.293e-02 7.019e-03 1.842 0.0663 .
horsepower -2.877e-02 1.286e-02 -2.236 0.0259 *
weight 3.044e-02 4.652e-03 6.543 1.94e-10 ***
acceleration 1.447e-01 9.196e-02 1.574 0.1164
year 2.084e+00 1.732e-01 12.033 < 2e-16 ***
origin 1.174e+00 2.597e-01 4.519 8.30e-06 ***
weight:year -4.879e-04 6.097e-05 -8.002 1.47e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.084 on 383 degrees of freedom
Multiple R-squared: 0.847, Adjusted R-squared: 0.8439
F-statistic: 265.1 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: weight:origin
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-10.6056 -2.0253 -0.1315 1.6295 12.5217
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.482e+01 4.891e+00 -5.075 6.06e-07 ***
cylinders -5.799e-01 3.171e-01 -1.829 0.0682 .
displacement 8.833e-03 7.810e-03 1.131 0.2588
horsepower -1.068e-02 1.358e-02 -0.787 0.4321
weight -2.782e-03 1.084e-03 -2.566 0.0107 *
acceleration 3.766e-02 9.729e-02 0.387 0.6989
year 7.632e-01 4.998e-02 15.270 < 2e-16 ***
origin 6.633e+00 1.265e+00 5.244 2.60e-07 ***
weight:origin -2.309e-03 5.477e-04 -4.215 3.11e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.257 on 383 degrees of freedom
Multiple R-squared: 0.8294, Adjusted R-squared: 0.8258
F-statistic: 232.7 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: acceleration:year
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-10.2277 -2.1428 -0.0839 1.8216 12.2997
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.6199866 20.8291407 4.687 3.86e-06 ***
cylinders -0.2163880 0.3148719 -0.687 0.49236
displacement 0.0087776 0.0074936 1.171 0.24219
horsepower -0.0269212 0.0133812 -2.012 0.04493 *
weight -0.0058738 0.0006363 -9.232 < 2e-16 ***
acceleration -7.1818103 1.2900806 -5.567 4.88e-08 ***
year -0.7461986 0.2696927 -2.767 0.00593 **
origin 1.2630333 0.2691459 4.693 3.76e-06 ***
acceleration:year 0.0945502 0.0167501 5.645 3.22e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.202 on 383 degrees of freedom
Multiple R-squared: 0.8352, Adjusted R-squared: 0.8317
F-statistic: 242.6 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: acceleration:origin
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.4106 -1.8805 -0.2471 1.7891 11.9680
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3327273 5.0570077 -0.066 0.9476
cylinders -0.5881258 0.3063127 -1.920 0.0556 .
displacement 0.0086251 0.0073062 1.181 0.2385
horsepower -0.0250843 0.0131049 -1.914 0.0564 .
weight -0.0052351 0.0006439 -8.131 5.98e-15 ***
acceleration -1.0340600 0.1896960 -5.451 8.98e-08 ***
year 0.7623813 0.0482774 15.792 < 2e-16 ***
origin -9.3089774 1.6109675 -5.779 1.56e-08 ***
acceleration:origin 0.6546959 0.0969263 6.755 5.34e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.15 on 383 degrees of freedom
Multiple R-squared: 0.8405, Adjusted R-squared: 0.8371
F-statistic: 252.2 on 8 and 383 DF, p-value: < 2.2e-16
### Interaction Term: year:origin
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-8.6072 -2.0439 -0.0596 1.7121 12.3368
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.492e+00 9.044e+00 0.939 0.348353
cylinders -5.042e-01 3.192e-01 -1.579 0.115082
displacement 1.567e-02 7.530e-03 2.081 0.038060 *
horsepower -1.399e-02 1.364e-02 -1.025 0.305786
weight -6.352e-03 6.449e-04 -9.851 < 2e-16 ***
acceleration 9.185e-02 9.766e-02 0.941 0.347546
year 4.189e-01 1.125e-01 3.723 0.000226 ***
origin -1.405e+01 4.699e+00 -2.989 0.002978 **
year:origin 1.989e-01 6.030e-02 3.298 0.001064 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.286 on 383 degrees of freedom
Multiple R-squared: 0.8264, Adjusted R-squared: 0.8228
F-statistic: 227.9 on 8 and 383 DF, p-value: < 2.2e-16
# 下面这种方法弃用!!!
# 定义变量
variables <- c('cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin')
# 生成所有两两变量组合的交叉项
combinations <- combn(variables, 2, simplify = FALSE)
# 创建包含所有交叉项的公式
interaction_terms <- sapply(combinations, function(combo) paste(combo, collapse = ":"))
formula <- as.formula(paste("mpg ~ . -name +", paste(interaction_terms, collapse = " + ")))
# 拟合线性模型并输出摘要
lm.fit <- lm(formula, data = Auto)
summary(lm.fit)
Call:
lm(formula = formula, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-7.6303 -1.4481 0.0596 1.2739 11.1386
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.548e+01 5.314e+01 0.668 0.50475
cylinders 6.989e+00 8.248e+00 0.847 0.39738
displacement -4.785e-01 1.894e-01 -2.527 0.01192 *
horsepower 5.034e-01 3.470e-01 1.451 0.14769
weight 4.133e-03 1.759e-02 0.235 0.81442
acceleration -5.859e+00 2.174e+00 -2.696 0.00735 **
year 6.974e-01 6.097e-01 1.144 0.25340
origin -2.090e+01 7.097e+00 -2.944 0.00345 **
cylinders:displacement -3.383e-03 6.455e-03 -0.524 0.60051
cylinders:horsepower 1.161e-02 2.420e-02 0.480 0.63157
cylinders:weight 3.575e-04 8.955e-04 0.399 0.69000
cylinders:acceleration 2.779e-01 1.664e-01 1.670 0.09584 .
cylinders:year -1.741e-01 9.714e-02 -1.793 0.07389 .
cylinders:origin 4.022e-01 4.926e-01 0.816 0.41482
displacement:horsepower -8.491e-05 2.885e-04 -0.294 0.76867
displacement:weight 2.472e-05 1.470e-05 1.682 0.09342 .
displacement:acceleration -3.479e-03 3.342e-03 -1.041 0.29853
displacement:year 5.934e-03 2.391e-03 2.482 0.01352 *
displacement:origin 2.398e-02 1.947e-02 1.232 0.21875
horsepower:weight -1.968e-05 2.924e-05 -0.673 0.50124
horsepower:acceleration -7.213e-03 3.719e-03 -1.939 0.05325 .
horsepower:year -5.838e-03 3.938e-03 -1.482 0.13916
horsepower:origin 2.233e-03 2.930e-02 0.076 0.93931
weight:acceleration 2.346e-04 2.289e-04 1.025 0.30596
weight:year -2.245e-04 2.127e-04 -1.056 0.29182
weight:origin -5.789e-04 1.591e-03 -0.364 0.71623
acceleration:year 5.562e-02 2.558e-02 2.174 0.03033 *
acceleration:origin 4.583e-01 1.567e-01 2.926 0.00365 **
year:origin 1.393e-01 7.399e-02 1.882 0.06062 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.695 on 363 degrees of freedom
Multiple R-squared: 0.8893, Adjusted R-squared: 0.8808
F-statistic: 104.2 on 28 and 363 DF, p-value: < 2.2e-16
# f
# 此处我们以horsepower为例,进行非线性变换拟合
par(mfrow = c(2, 2))
plot(Auto$horsepower, Auto$mpg)
plot(log(Auto$horsepower), Auto$mpg)
plot(sqrt(Auto$horsepower), Auto$mpg)
plot(Auto$horsepower^2, Auto$mpg)
par(mfrow = c(2, 2))
plot(Auto$horsepower, Auto$mpg, cex = 0.2)
plot(log(Auto$horsepower), Auto$mpg, cex = 0.2)
plot(sqrt(Auto$horsepower), Auto$mpg, cex = 0.2)
plot(Auto$horsepower^2, Auto$mpg, cex = 0.2)
# 此处我们仅以log转换为例
Auto %>% mutate(horsepower=log(horsepower))
fit.new <- lm(mpg ~ . -name,data = Auto)
summary(fit.new)
par(mfrow = c(2, 2))
plot(fit.new)
Auto <- Auto %>% mutate(horsepower=log(horsepower))
fit.new <- lm(mpg ~ . -name,data = Auto)
summary(fit.new)
par(mfrow = c(2, 2))
plot(fit.new)
| mpg | cylinders | displacement | horsepower | weight | acceleration | year | origin | name | |
|---|---|---|---|---|---|---|---|---|---|
| <dbl> | <int> | <dbl> | <dbl> | <int> | <dbl> | <int> | <int> | <fct> | |
| 1 | 18 | 8 | 307 | 4.867534 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
| 2 | 15 | 8 | 350 | 5.105945 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
| 3 | 18 | 8 | 318 | 5.010635 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
| 4 | 16 | 8 | 304 | 5.010635 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
| 5 | 17 | 8 | 302 | 4.941642 | 3449 | 10.5 | 70 | 1 | ford torino |
| 6 | 15 | 8 | 429 | 5.288267 | 4341 | 10.0 | 70 | 1 | ford galaxie 500 |
| 7 | 14 | 8 | 454 | 5.393628 | 4354 | 9.0 | 70 | 1 | chevrolet impala |
| 8 | 14 | 8 | 440 | 5.370638 | 4312 | 8.5 | 70 | 1 | plymouth fury iii |
| 9 | 14 | 8 | 455 | 5.416100 | 4425 | 10.0 | 70 | 1 | pontiac catalina |
| 10 | 15 | 8 | 390 | 5.247024 | 3850 | 8.5 | 70 | 1 | amc ambassador dpl |
| 11 | 15 | 8 | 383 | 5.135798 | 3563 | 10.0 | 70 | 1 | dodge challenger se |
| 12 | 14 | 8 | 340 | 5.075174 | 3609 | 8.0 | 70 | 1 | plymouth 'cuda 340 |
| 13 | 15 | 8 | 400 | 5.010635 | 3761 | 9.5 | 70 | 1 | chevrolet monte carlo |
| 14 | 14 | 8 | 455 | 5.416100 | 3086 | 10.0 | 70 | 1 | buick estate wagon (sw) |
| 15 | 24 | 4 | 113 | 4.553877 | 2372 | 15.0 | 70 | 3 | toyota corona mark ii |
| 16 | 22 | 6 | 198 | 4.553877 | 2833 | 15.5 | 70 | 1 | plymouth duster |
| 17 | 18 | 6 | 199 | 4.574711 | 2774 | 15.5 | 70 | 1 | amc hornet |
| 18 | 21 | 6 | 200 | 4.442651 | 2587 | 16.0 | 70 | 1 | ford maverick |
| 19 | 27 | 4 | 97 | 4.477337 | 2130 | 14.5 | 70 | 3 | datsun pl510 |
| 20 | 26 | 4 | 97 | 3.828641 | 1835 | 20.5 | 70 | 2 | volkswagen 1131 deluxe sedan |
| 21 | 25 | 4 | 110 | 4.465908 | 2672 | 17.5 | 70 | 2 | peugeot 504 |
| 22 | 24 | 4 | 107 | 4.499810 | 2430 | 14.5 | 70 | 2 | audi 100 ls |
| 23 | 25 | 4 | 104 | 4.553877 | 2375 | 17.5 | 70 | 2 | saab 99e |
| 24 | 26 | 4 | 121 | 4.727388 | 2234 | 12.5 | 70 | 2 | bmw 2002 |
| 25 | 21 | 6 | 199 | 4.499810 | 2648 | 15.0 | 70 | 1 | amc gremlin |
| 26 | 10 | 8 | 360 | 5.370638 | 4615 | 14.0 | 70 | 1 | ford f250 |
| 27 | 10 | 8 | 307 | 5.298317 | 4376 | 15.0 | 70 | 1 | chevy c20 |
| 28 | 11 | 8 | 318 | 5.347108 | 4382 | 13.5 | 70 | 1 | dodge d200 |
| 29 | 9 | 8 | 304 | 5.262690 | 4732 | 18.5 | 70 | 1 | hi 1200d |
| 30 | 27 | 4 | 97 | 4.477337 | 2130 | 14.5 | 71 | 3 | datsun pl510 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 368 | 28 | 4 | 112 | 4.477337 | 2605 | 19.6 | 82 | 1 | chevrolet cavalier |
| 369 | 27 | 4 | 112 | 4.477337 | 2640 | 18.6 | 82 | 1 | chevrolet cavalier wagon |
| 370 | 34 | 4 | 112 | 4.477337 | 2395 | 18.0 | 82 | 1 | chevrolet cavalier 2-door |
| 371 | 31 | 4 | 112 | 4.442651 | 2575 | 16.2 | 82 | 1 | pontiac j2000 se hatchback |
| 372 | 29 | 4 | 135 | 4.430817 | 2525 | 16.0 | 82 | 1 | dodge aries se |
| 373 | 27 | 4 | 151 | 4.499810 | 2735 | 18.0 | 82 | 1 | pontiac phoenix |
| 374 | 24 | 4 | 140 | 4.521789 | 2865 | 16.4 | 82 | 1 | ford fairmont futura |
| 375 | 36 | 4 | 105 | 4.304065 | 1980 | 15.3 | 82 | 2 | volkswagen rabbit l |
| 376 | 37 | 4 | 91 | 4.219508 | 2025 | 18.2 | 82 | 3 | mazda glc custom l |
| 377 | 31 | 4 | 91 | 4.219508 | 1970 | 17.6 | 82 | 3 | mazda glc custom |
| 378 | 38 | 4 | 105 | 4.143135 | 2125 | 14.7 | 82 | 1 | plymouth horizon miser |
| 379 | 36 | 4 | 98 | 4.248495 | 2125 | 17.3 | 82 | 1 | mercury lynx l |
| 380 | 36 | 4 | 120 | 4.477337 | 2160 | 14.5 | 82 | 3 | nissan stanza xe |
| 381 | 36 | 4 | 107 | 4.317488 | 2205 | 14.5 | 82 | 3 | honda accord |
| 382 | 34 | 4 | 108 | 4.248495 | 2245 | 16.9 | 82 | 3 | toyota corolla |
| 383 | 38 | 4 | 91 | 4.204693 | 1965 | 15.0 | 82 | 3 | honda civic |
| 384 | 32 | 4 | 91 | 4.204693 | 1965 | 15.7 | 82 | 3 | honda civic (auto) |
| 385 | 38 | 4 | 91 | 4.204693 | 1995 | 16.2 | 82 | 3 | datsun 310 gx |
| 386 | 25 | 6 | 181 | 4.700480 | 2945 | 16.4 | 82 | 1 | buick century limited |
| 387 | 38 | 6 | 262 | 4.442651 | 3015 | 17.0 | 82 | 1 | oldsmobile cutlass ciera (diesel) |
| 388 | 26 | 4 | 156 | 4.521789 | 2585 | 14.5 | 82 | 1 | chrysler lebaron medallion |
| 389 | 22 | 6 | 232 | 4.718499 | 2835 | 14.7 | 82 | 1 | ford granada l |
| 390 | 32 | 4 | 144 | 4.564348 | 2665 | 13.9 | 82 | 3 | toyota celica gt |
| 391 | 36 | 4 | 135 | 4.430817 | 2370 | 13.0 | 82 | 1 | dodge charger 2.2 |
| 392 | 27 | 4 | 151 | 4.499810 | 2950 | 17.3 | 82 | 1 | chevrolet camaro |
| 393 | 27 | 4 | 140 | 4.454347 | 2790 | 15.6 | 82 | 1 | ford mustang gl |
| 394 | 44 | 4 | 97 | 3.951244 | 2130 | 24.6 | 82 | 2 | vw pickup |
| 395 | 32 | 4 | 135 | 4.430817 | 2295 | 11.6 | 82 | 1 | dodge rampage |
| 396 | 28 | 4 | 120 | 4.369448 | 2625 | 18.6 | 82 | 1 | ford ranger |
| 397 | 31 | 4 | 119 | 4.406719 | 2720 | 19.4 | 82 | 1 | chevy s-10 |
Call:
lm(formula = mpg ~ . - name, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.5903 -2.1565 -0.1169 1.8690 13.0604
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.218435 4.644294 -3.707 0.00024 ***
cylinders -0.493376 0.323282 -1.526 0.12780
displacement 0.019896 0.007515 2.647 0.00844 **
horsepower -0.016951 0.013787 -1.230 0.21963
weight -0.006474 0.000652 -9.929 < 2e-16 ***
acceleration 0.080576 0.098845 0.815 0.41548
year 0.750773 0.050973 14.729 < 2e-16 ***
origin 1.426141 0.278136 5.127 4.67e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.328 on 384 degrees of freedom
Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
Call:
lm(formula = mpg ~ . - name, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-9.3115 -2.0041 -0.1726 1.8393 12.6579
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.254005 8.589614 3.173 0.00163 **
cylinders -0.486206 0.306692 -1.585 0.11372
displacement 0.019456 0.006876 2.830 0.00491 **
horsepower -9.506436 1.539619 -6.175 1.69e-09 ***
weight -0.004266 0.000694 -6.148 1.97e-09 ***
acceleration -0.292088 0.103804 -2.814 0.00515 **
year 0.705329 0.048456 14.556 < 2e-16 ***
origin 1.482435 0.259347 5.716 2.19e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.18 on 384 degrees of freedom
Multiple R-squared: 0.837, Adjusted R-squared: 0.834
F-statistic: 281.6 on 7 and 384 DF, p-value: < 2.2e-16
# 可以发现在log转变之后horsepower似乎与mpg更呈现线性关系了,并且p值更加显著了
13,
set.seed(1)
# a
X <- rnorm(n = 100,mean = 0,sd = 1)
print(X)
[1] -0.626453811 0.183643324 -0.835628612 1.595280802 0.329507772 [6] -0.820468384 0.487429052 0.738324705 0.575781352 -0.305388387 [11] 1.511781168 0.389843236 -0.621240581 -2.214699887 1.124930918 [16] -0.044933609 -0.016190263 0.943836211 0.821221195 0.593901321 [21] 0.918977372 0.782136301 0.074564983 -1.989351696 0.619825748 [26] -0.056128740 -0.155795507 -1.470752384 -0.478150055 0.417941560 [31] 1.358679552 -0.102787727 0.387671612 -0.053805041 -1.377059557 [36] -0.414994563 -0.394289954 -0.059313397 1.100025372 0.763175748 [41] -0.164523596 -0.253361680 0.696963375 0.556663199 -0.688755695 [46] -0.707495157 0.364581962 0.768532925 -0.112346212 0.881107726 [51] 0.398105880 -0.612026393 0.341119691 -1.129363096 1.433023702 [56] 1.980399899 -0.367221476 -1.044134626 0.569719627 -0.135054604 [61] 2.401617761 -0.039240003 0.689739362 0.028002159 -0.743273209 [66] 0.188792300 -1.804958629 1.465554862 0.153253338 2.172611670 [71] 0.475509529 -0.709946431 0.610726353 -0.934097632 -1.253633400 [76] 0.291446236 -0.443291873 0.001105352 0.074341324 -0.589520946 [81] -0.568668733 -0.135178615 1.178086997 -1.523566800 0.593946188 [86] 0.332950371 1.063099837 -0.304183924 0.370018810 0.267098791 [91] -0.542520031 1.207867806 1.160402616 0.700213650 1.586833455 [96] 0.558486426 -1.276592208 -0.573265414 -1.224612615 -0.473400636
# b
eps <- rnorm(n = 100,mean = 0,sd = sqrt(0.25)) # 注意这里是sd = sqrt(0.25)而不是0.25
print(eps)
[1] 0.204700920 0.844436643 0.793294217 -0.165453900 -1.142617768 [6] 1.248830795 0.333533083 0.270663668 -0.006699762 0.255054211 [11] -0.082187916 0.210347322 -0.200123372 -0.685103939 0.493919134 [16] 0.759872513 -0.154370285 -0.626644878 0.321120653 -0.022354568 [21] -0.866609203 0.001065930 -0.315150167 -0.170484290 -0.578286181 [26] 0.901570954 -0.165566018 -0.802756706 0.098596719 0.131587823 [31] -0.492913350 -1.444460336 -0.320240851 0.285253818 -0.029861638 [36] -0.049089372 0.280410364 -0.593229319 0.548388522 -0.002672014 [41] 0.353655334 0.517053867 0.111740207 -0.439353806 0.581482278 [46] -1.000082472 -0.272395370 -0.127835355 -0.083060518 0.510231954 [51] 0.068110947 0.203583802 -0.034827407 -0.123832171 0.347775403 [56] 0.573114179 -1.201548107 0.286369778 0.187362203 -0.212633861 [61] 0.475506404 -0.194618591 -0.142165331 0.428704889 0.859813650 [66] 0.135027450 -0.211092005 -0.594556647 -0.165516489 -0.469914663 [71] -0.129466292 0.197189584 -0.425928546 1.324583441 0.078005838 [76] 0.565103634 -1.144561990 0.370500579 -0.658122580 0.459901839 [81] 0.199065078 -0.203764290 0.662129315 -0.350615835 -0.290307152 [86] -0.500536091 -0.334089303 0.472592477 0.216851075 0.502579609 [91] -0.195059332 0.188185146 0.122082462 -0.713128671 0.889214644 [96] 0.067223830 0.382799500 0.477568338 -0.025282851 -0.152907710
# c
Y <- -1 + 0.5 * X + eps
print(Y)
length(Y) # 向量长100
[1] -1.10852599 -0.06374169 -0.62452009 -0.36781350 -1.97786388 -0.16140340 [7] -0.42275239 -0.36017398 -0.71880909 -0.89763998 -0.32629733 -0.59473106 [13] -1.51074366 -2.79245388 0.05638459 -0.26259429 -1.16246542 -1.15472677 [19] -0.26826875 -0.72540391 -1.40712052 -0.60786592 -1.27786768 -2.16516014 [25] -1.26837331 -0.12649342 -1.24346377 -2.53813290 -1.14047831 -0.65944140 [31] -0.81357357 -2.49585420 -1.12640505 -0.74164870 -1.71839142 -1.25658665 [37] -0.91673461 -1.62288602 0.09840121 -0.62108414 -0.72860646 -0.60962697 [43] -0.53977810 -1.16102221 -0.76289557 -2.35383005 -1.09010439 -0.74356889 [49] -1.13923362 -0.04921418 -0.73283611 -1.10242939 -0.86426756 -1.68851372 [55] 0.06428725 0.56331413 -2.38515885 -1.23569754 -0.52777798 -1.28016116 [61] 0.67631528 -1.21423859 -0.79729565 -0.55729403 -0.51182295 -0.77057640 [67] -2.11357132 -0.86177922 -1.08888982 -0.38360883 -0.89171153 -1.15778363 [73] -1.12056537 -0.14246538 -1.54881086 -0.28917325 -2.36620793 -0.62894675 [79] -1.62095192 -0.83485863 -1.08526929 -1.27135360 0.25117281 -2.11239923 [85] -0.99333406 -1.33406090 -0.80253938 -0.67949949 -0.59813952 -0.36387100 [91] -1.46631935 -0.20788095 -0.29771623 -1.36302185 0.68263137 -0.65353296 [97] -1.25549660 -0.80906437 -1.63758916 -1.38960803
# 我们开始拟合模型
lm.fit <- lm(Y ~ X)
summary(lm.fit)
# (Intercept) -0.98632,对应的就是截距的估计值,X 0.51058,对应的就是斜率的估计值
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-1.45706 -0.24115 -0.02266 0.32462 1.32079
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.98632 0.05235 -18.840 < 2e-16 ***
X 0.51058 0.05815 8.781 5.34e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5197 on 98 degrees of freedom
Multiple R-squared: 0.4403, Adjusted R-squared: 0.4346
F-statistic: 77.1 on 1 and 98 DF, p-value: 5.336e-14
# d
plot(X,Y)
# 从图上看起来,似乎是线性的
# e
# 我们开始拟合模型
lm.fit <- lm(Y ~ X)
summary(lm.fit)
# 看sd的话,两个系数拟合的挺好的
# 模型的系数估计值都非常显著,标准误较小,说明模型拟合良好。
# 截距项和 X 的系数都显著不为零,说明 X 对 y 有显著的线性影响。
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-1.45706 -0.24115 -0.02266 0.32462 1.32079
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.98632 0.05235 -18.840 < 2e-16 ***
X 0.51058 0.05815 8.781 5.34e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5197 on 98 degrees of freedom
Multiple R-squared: 0.4403, Adjusted R-squared: 0.4346
F-statistic: 77.1 on 1 and 98 DF, p-value: 5.336e-14
# f
plot(X, Y)
abline(lm.fit,col="red")
abline(-0.98632, 0.51058, col = "red", lty = 2)
legend("topleft",
c("model fit", "population regression"),
col = c("black", "red"),
lty = c(1, 2)
)
# g
lm.fit <- lm(Y ~ X)
summary(lm.fit)
fit.2 <- lm(Y ~ X + I(X^2))
summary(fit.2)
# 或者
fit.22 <- lm(Y ~ poly(X, 2,raw = TRUE))
summary(fit.22)
anova(fit.2,lm.fit)
anova(fit.22,lm.fit)
# F检验都不显著,说明模型的拟合效果并没有提升
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-1.45706 -0.24115 -0.02266 0.32462 1.32079
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.98632 0.05235 -18.840 < 2e-16 ***
X 0.51058 0.05815 8.781 5.34e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5197 on 98 degrees of freedom
Multiple R-squared: 0.4403, Adjusted R-squared: 0.4346
F-statistic: 77.1 on 1 and 98 DF, p-value: 5.336e-14
Call:
lm(formula = Y ~ X + I(X^2))
Residuals:
Min 1Q Median 3Q Max
-1.46371 -0.24761 -0.01792 0.32978 1.32271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.979434 0.064138 -15.271 < 2e-16 ***
X 0.511912 0.058865 8.696 8.71e-14 ***
I(X^2) -0.008668 0.046208 -0.188 0.852
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5223 on 97 degrees of freedom
Multiple R-squared: 0.4405, Adjusted R-squared: 0.429
F-statistic: 38.19 on 2 and 97 DF, p-value: 5.856e-13
Call:
lm(formula = Y ~ poly(X, 2, raw = TRUE))
Residuals:
Min 1Q Median 3Q Max
-1.46371 -0.24761 -0.01792 0.32978 1.32271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.979434 0.064138 -15.271 < 2e-16 ***
poly(X, 2, raw = TRUE)1 0.511912 0.058865 8.696 8.71e-14 ***
poly(X, 2, raw = TRUE)2 -0.008668 0.046208 -0.188 0.852
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5223 on 97 degrees of freedom
Multiple R-squared: 0.4405, Adjusted R-squared: 0.429
F-statistic: 38.19 on 2 and 97 DF, p-value: 5.856e-13
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 97 | 26.45666 | NA | NA | NA | NA |
| 2 | 98 | 26.46626 | -1 | -0.009598243 | 0.03519075 | 0.8515884 |
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 97 | 26.45666 | NA | NA | NA | NA |
| 2 | 98 | 26.46626 | -1 | -0.009598243 | 0.03519075 | 0.8515884 |
# h
# 比如说我们拟合方差为0.05的误差的模型
x <- rnorm(100, 0, 1)
y <- -1 + 0.5 * x + rnorm(100, 0, sqrt(0.05))
fit2 <- lm(y ~ x)
summary(fit2)
plot(x, y)
abline(fit2)
abline(-1, 0.5, col = "red", lty = 2)
legend("topleft",
c("model fit", "population regression"),
col = c("black", "red"),
lty = c(1, 2)
)
# 可以看出,数据变异性减小,R2变高了
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-0.6547 -0.1600 -0.0127 0.1452 0.8618
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.00908 0.02631 -38.35 <2e-16 ***
x 0.50630 0.02663 19.01 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2628 on 98 degrees of freedom
Multiple R-squared: 0.7867, Adjusted R-squared: 0.7845
F-statistic: 361.4 on 1 and 98 DF, p-value: < 2.2e-16
# i
# 噪声越高,我们就拟合1
x <- rnorm(100, 0, 1)
y <- -1 + 0.5 * x + rnorm(100, 0, 1)
fit3 <- lm(y ~ x)
summary(fit3)
plot(x, y)
abline(fit3)
abline(-1, 0.5, col = "red", lty = 2)
legend("topleft",
c("model fit", "population regression"),
col = c("black", "red"),
lty = c(1, 2)
)
# 可以看出,数据变异性增大,R2变低了
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.73419 -0.80105 -0.05787 0.74623 2.87841
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.1992 0.1087 -11.027 < 2e-16 ***
x 0.5086 0.1130 4.501 1.87e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.086 on 98 degrees of freedom
Multiple R-squared: 0.1713, Adjusted R-squared: 0.1628
F-statistic: 20.26 on 1 and 98 DF, p-value: 1.866e-05
# j
confint(lm.fit)
confint(fit2)
confint(fit3)
# 误差越小,置信区间越小
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -1.0902064 | -0.8824249 |
| X | 0.3951885 | 0.6259784 |
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -1.0612967 | -0.9568549 |
| x | 0.4534481 | 0.5591528 |
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -1.4150018 | -0.9833927 |
| x | 0.2843477 | 0.7328345 |